Mining open source web repositories to measure the cost of evolutionary reuse

This paper proposes evolutionary reuse as a metric to measure the effect of maintenance and replacement decisions made by open source developers and relate them to cost efficiency. Evolutionary reuse is defined as the similarity of code between two versions of the same application. Maintenance can be seen as the creation of a new version of an application with an high degree of evolutionary reuse. Conversely, replacement takes places when a significant part of code is re-implemented from scratch and evolutionary reuse is low. The paper proposes an empirical model to measure evolutionary reuse and development costs by mining open-source software repository. 26 projects for a total of 171 application versions were analyzed. Results show that maintenance choices in an open-source context are not always cost efficient. Developers tend to maximize the reuse of code from the most recent versions of applications, even if their requirements are far from current needs. Consequently, the development costs per new line of code are found to grow with evolutionary reuse.

[1]  Tsvi Kuflik,et al.  Evaluating software reuse alternatives: a model and its application to an industrial case study , 2004, IEEE Transactions on Software Engineering.

[2]  Victor R. Basili,et al.  Understanding and predicting the process of software maintenance releases , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[3]  Robert J. Kauffman,et al.  Reuse and Productivity in Integrated Computer-Aided Software Engineering: An Empirical Study , 1991, MIS Q..

[4]  Sampling open source projects from portals: some preliminary investigations , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[5]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[6]  June M. Verner Function Point Analysis , 2002 .

[7]  Khaled El Emam,et al.  Software Cost Estimation with Incomplete Data , 2001, IEEE Trans. Software Eng..

[8]  Christian S. Collberg,et al.  A system for graph-based visualization of the evolution of software , 2003, SoftVis '03.

[9]  Kevin J. Dooley,et al.  A performance measure for software reuse projects , 1999 .

[10]  Jeffrey S. Foster,et al.  Understanding source code evolution using abstract syntax tree matching , 2005, MSR.

[11]  J. B. Dreger,et al.  Function point analysis , 1989 .

[12]  Siu Leung Chung,et al.  An Economic Model to Estimate Software Rewriting and Replacement Times , 1996, IEEE Trans. Software Eng..

[13]  Yong Tan,et al.  Comparing uniform and flexible policies for software maintenance and replacement , 2005, IEEE Transactions on Software Engineering.

[14]  Dirk Beyer,et al.  Clustering software artifacts based on frequent common changes , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[15]  Victor R. Basili,et al.  Viewing maintenance as reuse-oriented software development , 1990, IEEE Software.

[16]  Daniel Hoffman,et al.  Commonality and Variability in Software Engineering , 1998, IEEE Softw..

[17]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[18]  E. B. Swanson,et al.  Software maintenance management , 1980 .

[19]  Derek L. Nazareth,et al.  A Cost-benefit-model For Systematic Software Reuse , 2002, ECIS.

[20]  Harald C. Gall,et al.  Mining evolution data of a product family , 2005, MSR '05.

[21]  Silvana Castano,et al.  Analysis of an inventory of information systems in the public administration , 1996, Requirements Engineering.

[22]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.

[23]  Silvana Castano,et al.  Conceptual schema analysis: techniques and applications , 1998, TODS.

[24]  Jeffrey S. Poulin,et al.  A reuse metrics and return on investment model , 1993, [1993] Proceedings Advances in Software Reuse.

[25]  Rajiv D. Banker,et al.  Software complexity and maintenance costs , 1993, CACM.

[26]  Alessandro Bianchi,et al.  Evaluating software degradation through entropy , 2001, Proceedings Seventh International Software Metrics Symposium.

[27]  Kevin Crowston,et al.  The Perils and Pitfalls of Mining SourceForge , 2004, MSR.

[28]  Maurizio Morisio,et al.  Success and Failure Factors in Software Reuse , 2002, IEEE Trans. Software Eng..