Saturday, June 23, 2007

Searching Code

Integrated Development Environment's (IDE's) are stuck in the doldrums when it comes to finding the right bit of code for a job. Today's web 2.0 applications lead the way in information retrieval, but these tools are not available in the world of software development.

IDE's today

Current IDE's are based heavily around directory structures (namespaces) for grouping similar code together. Think System.XYZ. They have some simple tools for following references, i.e. go to the definition of a function or find all references to a function, and a rudimentary search engine.

The current state-of-the-art in Web 2.0 Information Retrieval

For large sets of data Search has kicked Directory structures into touch. Search however is not contextual (its dumb and has no understanding of context). This leads to billions of results may of which are irrelevant. Web 2.0 companies have tried to overcome this through tagging and mark-up (the Semantic Web), which has more meaning, but requires effort on the part of the user (i.e. adding that tag). This is being overcome by smart algorithms that personalise your information based on your previous history of activities - think Amazon and "People who read this also read....".

What are we missing??
  1. Viewing related information to the task currently being undertaken - hey, you are changing interface X - here are all the classes, which implement it. Or, you have just named a class Y here are all the other classes with similar names
  2. Better Search - common can we get a "Code Rank" algorithm - here are the most popular functions/classes based on their links (references) - and don't give me "you need to decouple your code more", some algorithms are more fundamental - like adding to a list
  3. Personalised Search - using the historical results from the work you are doing to hone your search into something useful.
  4. Better Collaboration - none of this I'm going to shove IM in a toolbar and let you get on with it. I want to get realtime visualisation of which of my team members are working on which bits of code etc

The Solution

  1. Make the IDE a Web-App running on a server, with integrated backend into the Repository = Solves 4 for Collaboration and also Version Control
  2. Builds all running on the Server-side = Faster Builds
  3. UI update to include contextual results panes - Solves 1
  4. Server-side search engine to cross correlate user's activity - Solves 2 and 3 for personalisation

Of course, this does come with a downside - us developers like to have the fastest, the whizziest, pc beast on the block, and if its running like a webapp we're not going to get the budget - Damn!