Leveraging usage similarity for effective retrieval of examples in code repositories

Developers often learn to use APIs (Application Programming Interfaces) by looking at existing examples of API usage. Code repositories contain many instances of such usage of APIs. However, conventional information retrieval techniques fail to perform well in retrieving API usage examples from code repositories. This paper presents Structural Semantic Indexing (SSI), a technique to associate words to source code entities based on similarities of API usage. The heuristic behind this technique is that entities (classes, methods, etc.) that show similar uses of APIs are semantically related because they do similar things. We evaluate the effectiveness of SSI in code retrieval by comparing three SSI based retrieval schemes with two conventional baseline schemes. We evaluate the performance of the retrieval schemes by running a set of 20 candidate queries against a repository containing 222,397 source code entities from 346 jars belonging to the Eclipse framework. The results of the evaluation show that SSI is effective in improving the retrieval of examples in code repositories.

[1]  R. Holmes,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[2]  Kajal T. Claypool,et al.  XSnippet: mining For sample code , 2006, OOPSLA '06.

[3]  Forrest Shull,et al.  Investigating Reading Techniques for Object-Oriented Framework Learning , 2000, IEEE Trans. Software Eng..

[4]  Gerard Salton,et al.  The State of Retrieval System Evaluation , 1992, Inf. Process. Manag..

[5]  Rob Miller,et al.  Keyword programming in Java , 2008, Automated Software Engineering.

[6]  David F. Redmiles,et al.  Reducing the variability of programmers' performance through explained examples , 1993, INTERCHI.

[7]  Sherry Shavor,et al.  The Java Developer's Guide to Eclipse , 2003 .

[8]  Mel Ó Cinnéide,et al.  A Recommender Agent for Software Libraries: An Evaluation of Memory-Based and Model-Based Collaborative Filtering , 2006, 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[9]  Yunwen Ye,et al.  Searching the library and asking the peers: learning to use Java APIs on demand , 2007, PPPJ.

[10]  Sushil Krishna Bajracharya,et al.  Sourcerer: a search engine for open source code supporting structure-based search , 2006, OOPSLA '06.

[11]  Gerhard Fischer,et al.  Integrating active information delivery and reuse repository systems , 2000, SIGSOFT '00/FSE-8.

[12]  Mira Mezini,et al.  Learning from examples to improve code completion systems , 2009, ESEC/SIGSOFT FSE.

[13]  James Fogarty,et al.  Assieme: finding and leveraging implicit references in a web search interface for programmers , 2007, UIST '07.

[14]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[15]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[16]  Scott Henninger,et al.  An evolutionary approach to constructing effective software reuse repositories , 1997, TSEM.

[17]  Martin P. Robillard,et al.  What Makes APIs Hard to Learn? Answers from Developers , 2009, IEEE Software.

[18]  Brad A. Myers,et al.  Mica: A Web-Search Tool for Finding API Components and Examples , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[19]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[20]  Denys Poshyvanyk,et al.  Evaluating recommended applications , 2008, RSSE '08.

[21]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[22]  Koushik Sen,et al.  SNIFF: A Search Engine for Java Using Free-Form Queries , 2009, FASE.

[23]  Mark Grechanik,et al.  Finding Relevant Applications for Prototyping , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[24]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[25]  Cristina V. Lopes,et al.  Archetypal Internet-Scale Source Code Searching , 2008, OSS.

[26]  Brad A. Myers,et al.  Jadeite: improving API documentation using usage information , 2009, CHI Extended Abstracts.

[27]  Janet Nykaza,et al.  What programmers really want: results of a needs assessment for SDK documentation , 2002, SIGDOC '02.

[28]  Sushil Krishna Bajracharya,et al.  SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[29]  Harold Ossher,et al.  Automatically locating framework extension examples , 2008, SIGSOFT '08/FSE-16.

[30]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[31]  Gerhard Fischer,et al.  Cognitive tools for locating and comprehending software objects for reuse , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[32]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[33]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[34]  Tao Xie,et al.  SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[35]  Joel Ossher,et al.  Sourcerer: An internet-scale software repository , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[36]  Christopher C. Yang Search Engines Information Retrieval in Practice , 2010 .

[37]  Gerhard Fischer,et al.  Reuse-Conducive Development Environments , 2005, Automated Software Engineering.

[38]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[39]  Scott Robert Henninger Locating relevant examples for example-based software design , 1993 .

[40]  Chris Laffra,et al.  Official Eclipse 3.0 FAQs , 2004 .

[41]  Mary Beth Rosson,et al.  The reuse of uses in Smalltalk programming , 1996, TCHI.