Open source software development and maintenance: an exploratory analysis

The purpose of this research was to create measures and models for the evaluation of Open Source Software (OSS) projects. An exploratory analysis of the development and maintenance processes in OSS was conducted for this purpose. Data mining and text mining techniques were used to discover knowledge from transactional datasets maintained on OSS projects. Large and comprehensive datasets were used to formulate, test and validate the models. A new multidimensional measure of OSS project performance, called project viability was defined and validated. A theoretical and empirical measurement framework was used to evaluate the new measure. OSS project data from SourceForge.net was used to validate the new measure. Results indicated that project viability is a measure of the performance of OSS projects. Three models were then created for each dimension of project viability. Multiple data mining techniques were used to create the models. Variables identified from process, product, resource and end-user characteristics of the project were used. The use of new variables created through text mining improved the performance of the models. The first model was created for OSS projects in the development phase. The results indicated that end-user involvement could play a significant role in the development of OSS projects. It was also discovered that certain types of projects are more suitable for development in OSS communities. The second model was developed for OSS projects in their maintenance phase. A two-stage model for maintenance performance was selected. The results indicated that high project usage and usefulness could improve the maintenance performance of OSS projects. The third model was developed to investigate the affects of maintenance activities on the project internal structure. Maintenance data for Linux project was used to develop a new taxonomy for OSS maintenance patches. These results were then used to study the affects of various types of patches on the internal structure of the software. It was found that performing proactive maintenance on the software moderates its internal structure.

[1]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[2]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[3]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  J. Neter,et al.  Applied Linear Regression Models , 1983 .

[5]  Shari Lawrence Pfleeger,et al.  Towards a Framework for Software Measurement Validation , 1995, IEEE Trans. Software Eng..

[6]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[7]  Norman F. Schneidewind,et al.  Methodology For Validating Software Metrics , 1992, IEEE Trans. Software Eng..

[8]  Elaine J. Weyuker,et al.  Evaluating Software Complexity Measures , 2010, IEEE Trans. Software Eng..

[9]  Maurizio Morisio,et al.  Characteristics of open source projects , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[10]  Ephraim R. McLean,et al.  Information Systems Success: The Quest for the Dependent Variable , 1992, Inf. Syst. Res..

[11]  George C.J. Fernandez,et al.  Data Mining Using SAS Applications , 2002 .

[12]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[13]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[14]  Giancarlo Succi,et al.  An empirical study of open-source and closed-source software products , 2004, IEEE Transactions on Software Engineering.