Automated dependency resolution for open source software

Opportunities for software reuse are plentiful, thanks in large part to the widespread adoption of open source processes and the availability of search engines for locating relevant artifacts. One challenge presented by open source software reuse is simply getting a newly downloaded artifact to build/run in the first place. The artifact itself likely reuses other artifacts, and so depends on their being located to function properly. While merely tedious in the individual case, this can cause serious difficulties for those seeking to study open source software. It is simply not feasible to manually resolve dependencies for thousands of projects, and many forms of analysis require declarative completeness. In this paper we present a method for automatically resolving dependencies for open source software. It works by cross-referencing a project's missing type information with a repository of candidate artifacts. We have implemented this method on top of the Sourcerer, an infrastructure for the large-scale indexing and analysis of open source code. The performance of our resolution algorithm was evaluated in two parts. First, for a small number of popular open source projects, we manually examined the artifacts suggested by our system to determine if they were appropriate. Second, we applied the algorithm to the 13,241 projects in the Sourcerer managed repository to evaluate the rate of resolution success. The results demonstrate the feasibility of this approach, as the algorithm located all of the required artifacts needed by 3,904 additional projects, increasing the percentage of declaratively complete projects in Sourcerer from 39% to 69%.

[1]  A. Mockus,et al.  Large-Scale Code Reuse in Open Source Software , 2007, First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07: ICSE Workshops 2007).

[2]  Emden R. Gansner,et al.  A C++ data model supporting reachability analysis and dead code detection , 1997, ESEC '97/FSE-5.

[3]  Robert J. Walker,et al.  Supporting the Investigation and Planning of Pragmatic Reuse Tasks , 2007, 29th International Conference on Software Engineering (ICSE'07).

[4]  Colin Atkinson,et al.  Code Conjurer: Pulling Reusable Software out of Thin Air , 2008, IEEE Software.

[5]  Serge Demeyer,et al.  FAMIX 2. 1-the FAMOOS information exchange model , 1999 .

[6]  Laurie J. Hendren,et al.  Enabling static analysis for partial java programs , 2008, OOPSLA.

[7]  Sushil Krishna Bajracharya,et al.  SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[8]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[9]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[10]  Reid Holmes Unanticipated reuse of large-scale software features , 2006, ICSE '06.

[11]  David S. Rosenblum,et al.  WREN---an environment for component-based development , 2001, ESEC/FSE-9.

[12]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[13]  David M. Nichols,et al.  The Usability of Open Source Software , 2003, First Monday.

[14]  Shinji Kusumoto,et al.  Ranking significance of software components based on use relations , 2003, IEEE Transactions on Software Engineering.