Developers deserve a better search engine
Bas van Essen, Friday, September 21, 2018

In the ongoing struggle to maximise productivity of software development, there isn’t only attention for serverless computing, optimising agile workflows and creating a developer-centric culture. Scientists signalise we shouldn’t overlook the (inefficient) ways programmers have to search for external online information.

Lets first approach this topic from a quantitative perspective: how much time do developers spend on online search? We take a recent study by researchers from Computer Science faculties of the universities of Hangzhou, Vancouver, Kingston, Singapore and Canberra, as a starting point. Their paper ‘What do developers search for on the web?’ looks back on two weeks of automatically tracking the computer activities of 60 software developers from outsourcing companies Insigma Global Services and Hengtian.

For the context: Insigma has approximately 500 employees and fulfils projects for the Chinese web giants Alibaba and Baidu. Hengtian consists of roughly 2000 employees, that develop software for US and European corporations such as State Street Bank and Cisco. Both mainly have developers skilled in Java, .NET and C/C++ and together these employees have an average of 3.3 years of IT experience.

The search time spend

Computer activity tracking shows the daily percentage that Insigma’s and Hengtian’s developers spend on online search is 15%. One of them even spent 35% of her time on online search, but she explained she just graduated from university, and spent a lot of time searching online for introductions into techniques that are used in her projects.

The researchers found that on average the developers work for five hours behind their computer each day. That means that on the whole, they dedicate 45 minutes of their daily working hours to search engines.

time dedicated online search developers

As figure 2 shows, most of Insigma’s and Hengtian’s developers performed 100–500 queries over two weeks. That comes down to 10 to 50 queries per work day, daily executed within half an hour to one hour.

number daily search tasks developers

There are of course differences in how many minutes developers spend on online search each day. Are researchers assessing a company with many experienced or junior developers? Are the developers working on project base or focussing long-term on one platform? There are many variables that could influence the time spent.

For example last year, the study ‘Patterns of developers behaviour: A 1000-hour industrial study’ tracked C++ developers in one company developing real-time apps for controlling mechanical devices used in the manufacturing and agriculture industries. They estimated that the programmers only spend 10 minutes per day on online search. But there was no variety in developers, type of companies, projects and the number of participants was very low: six programmers.

Lets base our estimation on research that really tried to capture the common denominator. A recent study called ‘The Work Life of Developers: Activities, Switches and Perceived Productivity’ took a more balanced approach and monitored developers from four different companies of varying size, in different locations and project stages, using different kinds of programming languages, and different kinds of products and customers. It concluded that developers on average spend 11.4% of their eight daily working hours on work related browsing.

In other words, in this case the results show that developers dedicate 54 minutes a day on online search. That is a considerable amount of time, taking into account that developers usually can effectively work behind the computer for five hours a day.

programmers time spend on work activities

Part of the output of the study report ‘The Work Life of Developers: Activities, Switches and Perceived Productivity’

The output

When relatively safely concluding a considerable number of developers spends 45 to 60 minutes on carrying out online search each day, we should also assess whether the output of their queries reaches a desired quality level.

Lets first just summarise why programmers use search engines. The analysts behind ‘What do developers search for on the web?’ conclude that most of the queries are related to:

  • terminology explanations, such as about PaaS, SaaS and the blockchain
  • explanations about using a new (feature of a) programming language
  • reusable code snippets from third parties

In ‘How do developers search for code? A case study’, with staff members of the universities of Iowa and Nebraska reporting about 27 Google developers searching for code, it’s concluded they search for examples more than anything else. It’s about getting case examples and/or sample code.

Additionally, according to ‘What do developers search for on the web?’, also a considerable part of the searches are related to:

  • (how to use) third party libraries/services/API’s
  • understanding purpose and scope of a project, for example assignments related to software for ‘pensions’ or ‘stock exchange processes’
  • best-practices, e.g. about ‘architecture of a project’, or ‘comparing a programming solution with alternative solutions’
  • explanations for certain exceptions/error messages
  • software solutions to common programming and configuration bugs
  • examples and guidance on how to use Operating System Command Line Interfaces
  • examples and guidance on how to form SQL statements
  • database optimisation solutions
  • remembering syntactic details

Google

These observations show that code related search clearly is a priority for programmers. That makes quite some sense of course. There’s no doubt where developers carry out this code search: Google. To meet the increasing need for online code searcg, several commercial search engines have been built in the past, such as Google Code Search and Black Duck Open Hub CodeSearch. But they are now obsolete. Programmers massively turned to general purpose engines, of which Google is by far the most frequently used.

Taken from this perspective, a scientific report named ‘Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval’, shows us that code-related googling takes twice as much time as non-code related googling. Scientists from Corvalius and the universities of Virginia, North Carolina and Clemson assessed 149,610 queries from 310 programmers and saw that the time to complete a code search task is around 2 min 53 sec. To finish a non-code search task took them 1 min 35 sec.

So relatively, software developers googling for code have to make a lot of efforts before they find their desired result. The process demands more time, clicks on results, and query modifications than non-code search. Which is in a kind of way strange, as research also shows that developers searching for code attributes are less likely to click on a result. They just copy the code appearing at one of the Google search pages itself.

Worth trying: Sourcegraph lets you search and explore all of your organization’s code on the web

So although Google is customizing search results, it does not personalize search outcomes in favour of software developers. Programmers could benefit from Google inferring their code history to prioritize the results. Developers also may be better served if Google would be aware of their needs by, for example, providing additional operators or being cognizant of other tools they are currently using. But for now, Google only internally smoothens the search for code, in service of its own programmers.

A better code search engine

In ‘What do developers search for on the web?’, which also includes interviews with software developers, the annoyance about Google’s code search performance is quite noticeable. The programmers provide some recommendations to the search giant:

“Google obviously doesn’t always handle code well, e.g. underscores etc. It would be useful if it stopped autocorrecting that, or if there was something you can e.g. append to the URL.”

“Allow special characters in search queries! When I search for C++, don’t search for C with two spaces after it!”

“Allow me to search in all worldwide open source code by writing code expressions. Allow me to search in all code repositories together with one simple textbox. Allow me to search for weird symbols and operators treating them literally. Show good code search results prioritizing things that semantically make sense in code?—?if I search for a method name within a repository, show me the method definition, not the use.”

“Improve tools for searching code online. All software hosting sites (i.e. Github/Bitbucket/grepcode/etc.) have very poor tools for searching code. Provide a way to jump directly from a stack trace to the matching code (same revision/file/line number) if the code is hosted online Provide tools for “jump to definition” in online code browsers. Basically make online code as easy to browse as within an IDE.”

Participants also found it difficult to find public datasets to test a newly developed algorithm or system, as Google can’t locate domain-specific datasets.

Based on the comments, a developer-friendly web search engine should:

  • support web search with software engineering (SE) related symbols and terminologies,
  • allow developers to specify that search results should originate from code repositories and SE related websites,
  • integrate search functionalities into IDEs, and
  • prioritize search results by considering their semantic meanings relative to the particular context in which a developer is working.
  • understand software engineering terminology and domain specific concepts, so that it can identify the useful software information sites, and prioritise results from these sites.

Google is of course aware of these challenges. In ‘How do developers search for code? A case study’, involving Google programmers themselves, the search company got the following recommendations:

1. Google should focus on developer’s questions

“It should focus on comprehension based on the questions asked and how developers are asking them. Google should also tap into the code repository metadata such as build dependencies, developers involved, file and workflow permissions, tests executed, and relevant review comments since many of the questions asked during search include such pieces of information.”

2. Provide simple code examples

“There’s a strong desire to find code examples, for example illustrating API usage. To reduce duration of search sessions and keep developers productive, these examples should be minimal and illustrate common usage. Integrating the usage frequency for patterns of API usage into a ranking algorithm, or their size, could help developers find needed information faster.”

3. Consider code location

“Query features such as the -file: operator help support the specification of a search scope, but additional operators could help scope the search, for example, to code touched by specific developers or groups of developers (-dev:). Tools also should predict a developer’s locality needs based on search history and the files recently touched to better rank the matches.”

4. Consider richer context

“Search patterns vary across activities. A tool cognizant of the contextual elements (e.g., applications open, recent communications) associated with different activities could be more effective. For example, if a developer works with reviewing and bug tracking tools, then the search tool could give priority to matches that show code changes.”

5. Consider integrating code search into development environment

With search sessions lasting only a few minutes, and a small number of queries per session, the time to switch context between the development or code review environment and search tool becomes dominant. Integrating code search into the development environment could reduce that overhead.

Maybe it’s about NOT using Google

To end with the report we first discussed, ‘What do developers search for on the web?’, one remarkable result is that actually 63%(!) of the search queries ended up in visiting Stack Overflow, the Q&A site every single programmer knows about. If that is the case, why aren’t we just starting our search at Stack Overflow from the first place?

Speaking to one of our colleague programmers, he concludes the search engine of Stack Overflow ‘just s**ks’. So that’s why he still prefers to include site:stackoverflow.com in his Google queries. This despite the fact that Stack Overflow’s search engine makes use of Elastic, a fast-grown and much hailed company that recently went IPO. Elastic, promoting itself as a natural replacer of Google Site Search, provides self-service software that makes data usable in real time and at scale for search and logging.

Stack Overflow also makes efforts to ease code search within IDEs, in this case Microsoft’s Visual Studio.

Or could we be addicted to Google? One of our software developers lately had to admit she even doesn’t really know what the homepage of Stack Overflow looks like, being extremely used to visiting the site through a Google search query. Aside from a group claiming to use DuckDuckGo for private searching, are programmers stuck in a resilient routine to google?

New solutions

Under the scientific surface you will find new attempts to meet the search needs of developers. Take for example the recently released tool FaCoY, a search engine that accepts code fragments from users and recommends similar code fragments found in a target code base. FaCoY is based on query modification: after generating a code query summarizing structural code elements in the input fragment, it uses StackOverflow and GitHub data to find code snippets having similar descriptions but with variabilities. FaCoY uses these variant implementations to generate alternate code queries.

Update: at Wednesday, GitHub Engineering introduced its new semantic code search tool that is still in beta.



Also published on Medium.

Tags: , ,

Categorised in:

This post was written by Bas van Essen

1 Comment

  • Ian Kelly says:

    Great article!
    Our team has been working on this for over 2 years, we have come a long way in that time but there is a long way to go until we as an industry have the tools we need. We are working to Open Source our search aggregation, profiling, and ranking in the next few weeks https://pilot.codepilot.ai

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.