The Metrics to Evaluate the Health Status of OSS Projects Based on Factor Analysis

As open-source software (OSS) development is becoming a trend, an increasing number of businesses and developers are joining OSS projects. For project managers, developers and users, understanding the current health status of a project is very important to manage a development process, select the open-source projects to development or to adopt the software packages developed by projects. Therefore, an efficient approach to evaluate the health status of the open-source project is needed. Unfortunately, although many approaches including metrics have been proposed, they are designed in arbitrary ways. In this paper, a math ematical tool, i.e., factor analysis, is used to build a health evaluation model for OSS projects. As far as we know, this is the first time that factor analysis has been applied to evaluate OSS projects. This model is based on GitHub data and uses the basic indexes that are closely related to the health status of the projects as the input. Then, six new synthetic metrics, namely community activity, project popularity, development activity, completeness, responsiveness and persistence are obtained through factor analysis, which can be used to calculate the overall health score of a project. Moreover, in order to verify the effectiveness of this model, it is applied to some real projects and the results show that the overall scores achieved by this model can reflect the health status of the projects.

[1]  Darío Correal,et al.  OpenHub: a scalable architecture for the analysis of software quality attributes , 2014, MSR 2014.

[2]  Marco Tulio Valente,et al.  Understanding the Factors That Impact the Popularity of GitHub Repositories , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[3]  Georgios Gousios,et al.  GHTorrent: Github's data from a firehose , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[4]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[5]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[6]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[7]  Xavier Franch,et al.  Assessing open source communities' health using Service Oriented Computing concepts , 2014, 2014 IEEE Eighth International Conference on Research Challenges in Information Science (RCIS).

[8]  J. V. Maanen,et al.  Toward a theory of organizational socialization , 1977 .

[9]  Audris Mockus,et al.  A case study of open source software development: the Apache server , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[10]  Stefan Biffl,et al.  Monitoring the "health" status of open source web-engineering projects , 2007, Int. J. Web Inf. Syst..

[11]  C. Spearman General intelligence Objectively Determined and Measured , 1904 .

[12]  Rolph E. Anderson,et al.  Multivariate Data Analysis with Readings , 1979 .

[13]  Björn Lundell,et al.  Commodification of Industrial Software: A Case for Open Source , 2009, IEEE Software.

[14]  B. Tabachnick,et al.  Using Multivariate Statistics , 1983 .

[15]  Klaus Marius Hansen,et al.  Software ecosystems - A systematic literature review , 2013, J. Syst. Softw..

[16]  Jonas Gamalielsson,et al.  Responsiveness as a measure for assessing the health of OSS ecosystems , 2010 .

[17]  R. MacCallum,et al.  Sample size in factor analysis. , 1999 .

[18]  Georg von Krogh,et al.  Open Source Software and the "Private-Collective" Innovation Model: Issues for Organization Science , 2003, Organ. Sci..

[19]  Klaus Marius Hansen,et al.  Reviewing the Health of Software Ecosystems - A Conceptual Framework Proposal , 2013, IWSECO@ICSOB.

[20]  Walt Scacchi,et al.  Data Mining for Software Process Discovery in Open Source Software Development Communities , 2004, MSR.

[21]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[22]  Zhuo Yang,et al.  Influence analysis of Github repositories , 2016, SpringerPlus.

[23]  Slinger Jansen,et al.  Software ecosystems: a software ecosystem strategy assessment model , 2010, ECSA '10.

[24]  Harald C. Gall,et al.  An Analysis of the Effect of Code Ownership on Software Quality across Windows, Eclipse, and Firefox , 2010 .