A Large Scale Study of Multiple Programming Languages and Code Quality

Nowadays, most software use multiple programming languages to implement certain functionalities based on the strengths and weaknesses of different languages. Researchers in the past have studied the impact of independent programming languages on software quality, however, there has been little or no research on the impact of multiple languages on the quality of software. Does the use of multiple languages cause more bugs? Are certain languages when used with other languages make software more bug prone? What are the relationships between multi-language usage and various bug categories? In this study, we perform a large scale empirical investigation to provide some answers to these questions. We gather a large dataset consisting of popular projects from GitHub (628 projects, 85 million SLOC, 134 thousand authors, 3 million commits, in 17 languages) to understand the impact of using multiple languages on software quality. We build multiple regression models to study the effects of using different languages on the number of bug fixing commits while controlling for factors such as project age, project size, team size, and the number of commits. Our results show that in general implementing a project with more languages has a significant effect on project quality, as it increases defect proneness. Moreover, we find specific languages that are statistically significantly more defect prone when they are used in a multi-language setting. These include popular languages like C++, Objective-C, and Java. Furthermore, we note that the use of more languages significantly increases bug proneness across all bug categories. The effect is strongest for memory, concurrency, and algorithm bugs.

[1]  Victor Pankratius,et al.  Combining functional and imperative programming for multicore software: An empirical study evaluating Scala and Java , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[2]  Carlo A. Furia,et al.  A Comparative Study of Programming Languages in Rosetta Code , 2014, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[3]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[4]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[5]  Iulian Neamtiu,et al.  Assessing programming language impact on development and maintenance: a study on c and c++ , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[6]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[7]  Danny Dig,et al.  How do developers use parallel libraries? , 2012, SIGSOFT FSE.

[8]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[9]  Leo A. Meyerovich,et al.  Empirical analysis of programming language adoption , 2013, OOPSLA.

[10]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[11]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[12]  Premkumar T. Devanbu,et al.  Assert Use in GitHub Projects , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[13]  Marco Tulio Valente,et al.  On the Popularity of GitHub Applications: A Preliminary Note , 2015, ArXiv.

[14]  Ali Mili,et al.  An empirical study of programming language trends , 2005, IEEE Software.

[15]  David Lo,et al.  Popularity, Interoperability, and Impact of Programming Languages in 100,000 Open Source Projects , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[16]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[17]  Gang Tan,et al.  An Empirical Security Study of the Native Code in the JDK , 2008, USENIX Security Symposium.

[18]  Jacques Klein,et al.  Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[19]  Premkumar T. Devanbu,et al.  Gender and Tenure Diversity in GitHub Teams , 2015, CHI.

[20]  David Lo,et al.  An Empirical Study of Adoption of Software Testing in Open Source Projects , 2013, 2013 13th International Conference on Quality Software.

[21]  Harald C. Gall,et al.  A study of language usage evolution in open source software , 2011, MSR '11.

[22]  David Lo,et al.  Adoption of Software Testing in Open Source Projects--A Preliminary Study on 50,000 Projects , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[23]  Premkumar T. Devanbu,et al.  An empirical study on the influence of pattern roles on change-proneness , 2010, Empirical Software Engineering.