Is Popularity a Measure of Quality? An Analysis of Maven Components

One of the perceived values of open source software is the idea that many eyes can increase code quality and reduce the amount of bugs. This perception, however, has been questioned by some due the lack of supporting evidence. This paper presents an empirical analysis focusing on the relationship between the utilization of open source components and their engineering quality. In this study, we determine the popularity of 2,406 Maven components by calculating their usage across 55,191 open source Java projects. As a proxy of code quality for a component, we calculate (i) its defect density using the set of bug patterns reported by Find Bugs, and (ii) 9 popular software quality metrics from the SQO-OSS quality model. We then look for correlations between (i) popularity and defect density, and (ii) popularity and software quality metrics. In most of the cases, no correlations were found. In cases where minor correlations exist, they are driven by component size. Statistically speaking, and using the methods in this study, the Maven repository does not seem to support the "many eyeballs" effect. We conjecture that the utilization of open source components is driven by factors other than their engineering quality, an interpretation that is supported by the findings in this study.

[1]  Dennis G. Kafura,et al.  The Use of Software Complexity Metrics in Software Maintenance , 1987, IEEE Transactions on Software Engineering.

[2]  Robert L. Glass,et al.  Facts and fallacies of software engineering , 2002 .

[3]  Brad A. Myers,et al.  Jadeite: improving API documentation using usage information , 2009, CHI Extended Abstracts.

[4]  Steve McConnell,et al.  Code complete - a practical handbook of software construction, 2nd Edition , 1993 .

[5]  Sallie M. Henry,et al.  Object-oriented metrics that predict maintainability , 1993, J. Syst. Softw..

[6]  Tao Xie,et al.  SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[7]  Todd L. Veldhuizen Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and Zipf's Law , 2005, ArXiv.

[8]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[9]  Oyvind Hauge,et al.  An empirical study on selection of Open Source Software - Preliminary results , 2009, 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development.

[10]  Martin P. Robillard,et al.  Improving API Usage through Automatic Detection of Redundant Code , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[11]  Collin McMillan,et al.  Portfolio: finding relevant functions and their usage , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[12]  René Santaolaya Salgado,et al.  The Conceptual Coupling Metrics for Object-Oriented Systems , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[13]  Ralf Lämmel,et al.  Large-scale, AST-based API-usage analysis of open-source Java projects , 2011, SAC.

[14]  Michele Lanza,et al.  The small project observatory: a tool for reverse engineering software ecosystems , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[15]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.

[16]  Daniel M. Germán,et al.  A Model to Understand the Building and Running Inter-Dependencies of Software , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[17]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[18]  Jean-Christophe Deprez,et al.  Comparing Assessment Methodologies for Free/Open Source Software: OpenBRR and QSOS , 2008, PROFES.

[19]  Shinji Kusumoto,et al.  Ranking significance of software components based on use relations , 2003, IEEE Transactions on Software Engineering.

[20]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[21]  Ioannis Stamelos,et al.  Evaluating the Quality of Open Source Software , 2009, SQM@CSMR.

[22]  Ioannis Stamelos,et al.  The SQO-OSS Quality Model: Measurement Based Open Source Software Evaluation , 2008, OSS.

[23]  Diomidis Spinellis Choosing and Using Open Source Components , 2011, IEEE Software.

[24]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[25]  Shari Lawrence Pfleeger,et al.  Software Quality: The Elusive Target , 1996, IEEE Softw..

[26]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[27]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[28]  Jean-Christophe Deprez,et al.  An Operational Approach for Selecting Open Source Components in a Software Development Project , 2008, EuroSPI.

[29]  Cornelia Boldyreff,et al.  Successful Reuse of Software Components: A Report from the Open Source Perspective , 2011, OSS.

[30]  Premkumar T. Devanbu,et al.  Comparing static bug finders and statistical prediction , 2014, ICSE.

[31]  Colin Atkinson,et al.  Code Conjurer: Pulling Reusable Software out of Thin Air , 2008, IEEE Software.

[32]  Ahmed E. Hassan,et al.  Impact of Installation Counts on Perceived Quality: A Case Study on Debian , 2011, 2011 18th Working Conference on Reverse Engineering.

[33]  Laurie J. Hendren,et al.  Enabling static analysis for partial java programs , 2008, OOPSLA.

[34]  Sushil Krishna Bajracharya,et al.  Automated dependency resolution for open source software , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[35]  Georgios Gousios,et al.  Alitheia Core: An extensible software quality monitoring platform , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[36]  Sushil Krishna Bajracharya,et al.  Sourcerer: a search engine for open source code supporting structure-based search , 2006, OOPSLA '06.

[37]  Sushil Krishna Bajracharya,et al.  SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[38]  J. David Morgenthaler,et al.  Using FindBugs on production software , 2007, OOPSLA '07.

[39]  Reidar Conradi,et al.  An empirical study on software development with open source components in the chinese software industry , 2008 .