Change profile analysis of open-source software systems to understand their evolutionary behavior

Source code management systems (such as git) record changes to code repositories of Open-Source Software (OSS) projects. The metadata about a change includes a change message to record the intention of the change. Classification of changes, based on change messages, into different change types has been explored in the past to understand the evolution of software systems from the perspective of change size and change density only. However, software evolution analysis based on change classification with a focus on change evolution patterns is still an open research problem. This study examines change messages of 106 OSS projects, as recorded in the git repository, to explore their evolutionary patterns with respect to the types of changes performed over time. An automated keyword-based classifier technique is applied to the change messages to categorize the changes into various types (corrective, adaptive, perfective, preventive, and enhancement). Cluster analysis helps to uncover distinct change patterns that each change type follows. We identify three categories of 106 projects for each change type: high activity, moderate activity, and low activity. Evolutionary behavior is different for projects of different categories. The projects with high and moderate activity receive maximum changes during 76–81 months of the project lifetime. The project attributes such as the number of committers, number of files changed, and total number of commits seem to contribute the most to the change activity of the projects. The statistical findings show that the change activity of a project is related to the number of contributors, amount of work done, and total commits of the projects irrespective of the change type. Further, we explored languages and domains of projects to correlate change types with domains and languages of the projects. The statistical analysis indicates that there is no significant and strong relation of change types with domains and languages of the 106 projects.

[1]  Matthias Riebisch,et al.  A Taxonomy of Change Types and Its Application in Software Evolution , 2012, 2012 IEEE 19th International Conference and Workshops on Engineering of Computer-Based Systems.

[2]  Victor R. Basili,et al.  Understanding and predicting the process of software maintenance releases , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[3]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[4]  David S. Moore,et al.  Chi-Square Tests. , 1976 .

[5]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[6]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[7]  E. Burton Swanson,et al.  The dimensions of maintenance , 1976, ICSE '76.

[8]  Min-Gu Lee,et al.  An empirical study of software maintenance of a Web-based Java application , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[9]  Stefan Koch,et al.  Evolution of Open Source Software Systems - A Large-Scale Investigation , 2005 .

[10]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  Ned Chapin,et al.  Types of software evolution and software maintenance , 2001, J. Softw. Maintenance Res. Pract..

[13]  Yutao Ma,et al.  Empirical Evidence on Developer's Commit Activity for Open-Source Software Projects , 2013, SEKE.

[14]  Maria Joao C. Sousa,et al.  A survey on the Software Maintenance Process , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[15]  Prashant Palvia,et al.  Software maintenance management: Changes in the last decade , 1990, J. Softw. Maintenance Res. Pract..

[16]  Raouf Alomainy,et al.  An Empirical Study of Different Types of Changes in the Eclipse Project , 2013 .

[17]  Daniel T. Larose,et al.  k‐Nearest Neighbor Algorithm , 2005 .

[18]  W. Cleveland LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression , 1981 .

[19]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[20]  M.M. Lehman,et al.  Programs, life cycles, and laws of software evolution , 1980, Proceedings of the IEEE.

[21]  J. A. Cuesta-Albertos,et al.  Trimmed $k$-means: an attempt to robustify quantizers , 1997 .

[22]  E. Burch,et al.  Modeling software maintenance requests: a case study , 1997, 1997 Proceedings International Conference on Software Maintenance.

[23]  Timothy Lethbridge,et al.  A taxonomy of software types to facilitate search and evidence-based software engineering , 2008, CASCON '08.

[24]  Chris F. Kemerer,et al.  An Empirical Approach to Studying Software Evolution , 1999, IEEE Trans. Software Eng..

[25]  Juan Fernández-Ramil,et al.  A study of open source software evolution data using qualitative simulation , 2005, Softw. Process. Improv. Pract..

[26]  Jesús M. González-Barahona,et al.  Studying the laws of software evolution in a long-lived FLOSS project , 2013, J. Softw. Evol. Process..

[27]  Dirk Riehle,et al.  The empirical commit frequency distribution of open source projects , 2013, OpenSym.

[28]  Tom Lam,et al.  A software maintenance survey , 1994, Proceedings of 1st Asia-Pacific Software Engineering Conference.

[29]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[30]  David Gefen,et al.  The non-homogeneous maintenance periods: a case study of software modifications , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[31]  James O. Ramsay,et al.  Applied Functional Data Analysis: Methods and Case Studies , 2002 .

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  Meir M. Lehman Programs, life cycles, and laws of software evolution , 1980 .

[34]  Ahmed E. Hassan,et al.  Automated classification of change messages in open source projects , 2008, SAC '08.

[35]  Johanna Smeyers-Verbeke,et al.  Visual presentation of data by means of box plots , 2005 .

[36]  E. Burton Swanson,et al.  Characteristics of application software maintenance , 1978, CACM.

[37]  Alain Abran,et al.  Analysis of maintenance work categories through measurement , 1991, Proceedings. Conference on Software Maintenance 1991.

[38]  Keith H. Bennett,et al.  Software maintenance and evolution: a roadmap , 2000, ICSE '00.

[39]  Stephen R. Schach,et al.  Determining the Distribution of Maintenance Categories: Survey versus Measurement , 2003, Empirical Software Engineering.

[40]  Daniela Cruzes,et al.  Experience Report on the Effect of Software Development Characteristics on Change Distribution , 2008, PROFES.

[41]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[42]  Jonathan I. Maletic,et al.  Towards Understanding Large-Scale Adaptive Changes from Version Histories , 2013, 2013 IEEE International Conference on Software Maintenance.

[43]  Michael W. Godfrey,et al.  Mining recurrent activities: Fourier analysis of change events , 2009, 2009 31st International Conference on Software Engineering - Companion Volume.