Large-Scale Code Reuse in Open Source Software

We are exploring the practice of large-scale reuse involving at least a group of source code files. Our research question is to determine the extent of such reuse occurring in open source projects, to identify the code that is reused the most, and to investigate patterns of large-scale reuse. We start by identifying a sample of projects involving all code in several large repositories of open source projects, all projects bundled with popular distributions of Linux and BSD, and several large individual projects. In the next step we obtain the source code and identify groups of files reused among projects and determine the code that is most widely reused in our sample. Our findings indicate that more than 50% of the files were used in more than one project. The most widely reused components were small and represented templates requiring major and minor modifications and a group of files reused without any change. Some widely reused components involved hundreds of files.

[1]  Reidar Conradi,et al.  An empirical study of software reuse vs. defect-density and stability , 2004, Proceedings. 26th International Conference on Software Engineering.

[2]  Shinji Kusumoto,et al.  Component rank: relative significance rank for software component search , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[3]  Audris Mockus,et al.  Constructing universal version history , 2006, MSR '06.

[4]  Michael W. Godfrey,et al.  Improved tool support for the investigation of duplication in software , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[5]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[6]  Rishab Aiyer Ghosh,et al.  Economic impact of open source software on innovation and the competitiveness of the Information and Communication Technologies (ICT) sector in the EU , 2007 .

[7]  Premkumar T. Devanbu,et al.  Analytical and empirical evaluation of software reuse metrics , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[8]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[9]  Marvin V. Zelkowitz,et al.  Software Process Improvement in the NASA Software Engineering Laboratory , 1994 .