Sourcerer: An internet-scale software repository

Vast quantities of open source code are now available online, presenting a great potential resource for software developers. Yet the current generation of open source code search engines fail to take advantage of the rich structural information contained in the code they index. We have developed Sourcerer, an infrastructure for large-scale indexing and analysis of open source code. By taking full advantage of this structural information, Sourcerer provides a foundation upon which state of the art search engines and related tools easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer.

[1]  Sushil Krishna Bajracharya,et al.  Applying test-driven code search to the reuse of auxiliary functionality , 2009, SAC '09.

[2]  Flaviu Ghitulescu,et al.  Google Code Search , 2006 .

[3]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[4]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[5]  Kajal T. Claypool,et al.  XSnippet: mining For sample code , 2006, OOPSLA '06.

[6]  Luís Soares Barbosa,et al.  Component Identification Through Program Slicing , 2006, FACS.

[7]  Colin Atkinson,et al.  Lowering the barrier to reuse through test-driven search , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[8]  Amir Michail,et al.  CodeWeb: data mining library reuse patterns , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[9]  Guy L. Steele,et al.  Java(TM) Language Specification, The (3rd Edition) (Java (Addison-Wesley)) , 2005 .

[10]  Reid Holmes Do developers search for source code examples using multiple facts? , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[11]  Oleksandr Panchenko Hybrid storage for enabling fully-featured text search and fine-grained structural search over source code , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[12]  Colin Atkinson,et al.  Code Conjurer: Pulling Reusable Software out of Thin Air , 2008, IEEE Software.

[13]  Robert J. Walker,et al.  Lightweight, Semi-automated Enactment of Pragmatic-Reuse Plans , 2008, ICSR.

[14]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[15]  R. Holmes,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[16]  Emden R. Gansner,et al.  A C++ data model supporting reachability analysis and dead code detection , 1997, ESEC '97/FSE-5.

[17]  Tao Xie,et al.  SpotWeb: detecting framework hotspots via mining open source repositories on the web , 2008, MSR '08.