Revisiting Assert Use in GitHub Projects

Assertions are often used to test the assumptions that developers have about a program. An assertion contains a boolean expression which developers believe to be true at a particular program point. It throws an error if the expression is not satisfied, which helps developers to detect and correct bugs. Since assertions make developer assumptions explicit, assertions are also believed to improve under-standability of code. Recently, Casalnuovo et al. analyse C and C++ programs to understand the relationship between assertion usage and defect occurrence. Their results show that asserts have a small effect on reducing the density of bugs and developers often add asserts to methods they have prior knowledge of and larger ownership. In this study, we perform a partial replication of the above study on a large dataset of Java projects from GitHub (185 projects, 20 million LOC, 4 million commits, 0.2 million files and 1 million methods). We collect metrics such as number of asserts, number of defects, number of developers and number of lines changed to a method, and examine the relationship between asserts and defect occurrence. We also analyse relationship between developer experience and ownership and the number of asserts. Furthermore, we perform a study of what are different types of asserts added and why they are added by developers. We find that asserts have a small yet significant relationship with defect occurrence and developers who have added asserts to methods often have higher ownership of and experience with the methods than developers who did not add asserts.

[1]  Andrew Begel,et al.  Novice software developers, all over again , 2008, ICER '08.

[2]  Xavier Blanc,et al.  Code ownership in open-source software , 2014, EASE '14.

[3]  Jacques Klein,et al.  Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[4]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[5]  Rahul Premraj,et al.  Network Versus Code Metrics to Predict Defects: A Replication Study , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[6]  Paul Luo Li,et al.  What Makes a Great Software Engineer? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[7]  Giuliano Antoniol,et al.  Object-oriented design patterns recovery , 2001, J. Syst. Softw..

[8]  Peter C. Rigby,et al.  Organizational Volatility and Post-release Defects: A Replication Case Study Using Data from Google Chrome , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[9]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[10]  Premkumar T. Devanbu,et al.  Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[11]  Audris Mockus,et al.  Organizational volatility and its effects on software defects , 2010, FSE '10.

[12]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[13]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[14]  David Lo,et al.  Adoption of Software Testing in Open Source Projects--A Preliminary Study on 50,000 Projects , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[15]  Bertrand Meyer,et al.  Applying 'design by contract' , 1992, Computer.

[16]  Lauretta O. Osho,et al.  Axiomatic Basis for Computer Programming , 2013 .

[17]  Hong Sun,et al.  Investigating the use of analysis contracts to support fault isolation in object oriented code , 2002, ISSTA '02.

[18]  Jacek Czerwonka,et al.  Code Ownership and Software Quality: A Replication Study , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[19]  Matthias M. Müller,et al.  Two controlled experiments concerning the usefulness of assertions as a means for programming , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[20]  Bella Martin,et al.  Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions , 2012 .

[21]  David Lo,et al.  Popularity, Interoperability, and Impact of Programming Languages in 100,000 Open Source Projects , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[22]  Thomas D. LaToza,et al.  Maintaining mental models: a study of developer work habits , 2006, ICSE.

[23]  Premkumar T. Devanbu,et al.  Ecological inference in empirical software engineering , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[24]  Jean-Marc Jézéquel,et al.  Robustness and diagnosability of OO systems designed by contracts , 2001, Proceedings Seventh International Software Metrics Symposium.

[25]  David Hovemeyer,et al.  Finding more null pointer bugs, but not too many , 2007, PASTE '07.

[26]  Mangala Gowri Nanda,et al.  Accurate Interprocedural Null-Dereference Analysis for Java , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Greg Nelson,et al.  Extended static checking for Java , 2002, PLDI '02.

[28]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[29]  Nachiappan Nagappan,et al.  Assessing the Relationship between Software Assertions and Faults: An Empirical Investigation , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[30]  Bertrand Meyer,et al.  Contracts in Practice , 2012, FM.

[31]  Alan Shalloway,et al.  Design Patterns Explained: A New Perspective on Object-Oriented Design (2nd Edition) (Software Patterns Series) , 2004 .

[32]  Michael D. Ernst,et al.  Case studies and tools for contract specifications , 2014, ICSE.

[33]  J. T. Wulu,et al.  Regression analysis of count data , 2002 .

[34]  Timothy S. Gegg-Harrison,et al.  Studying program correctness by constructing contracts , 2003, ITiCSE '03.

[35]  Jeffrey C. Carver,et al.  Replication types: towards a shared taxonomy , 2014, EASE '14.

[36]  David Lo,et al.  Detecting similar repositories on GitHub , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[37]  M. Sivasakthi,et al.  Learning difficulties of 'object-oriented programming paradigm using Java': students' perspective , 2011 .

[38]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[39]  Premkumar T. Devanbu,et al.  Assert Use in GitHub Projects , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[40]  Burak Turhan,et al.  Conformance factor in test-driven development: initial results from an enhanced replication , 2014, EASE '14.

[41]  Ronnie E. S. Santos,et al.  Investigations about replication of empirical studies in software engineering: preliminary findings from a mapping study , 2014, EASE '14.

[42]  Patrice Chalin,et al.  Logical foundations of program assertions: what do practitioners want? , 2005, Third IEEE International Conference on Software Engineering and Formal Methods (SEFM'05).

[43]  Gabriele Bavota,et al.  How the Apache community upgrades dependencies: an evolutionary study , 2014, Empirical Software Engineering.

[44]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[45]  David Lo,et al.  Why and how developers fork what from whom in GitHub , 2017, Empirical Software Engineering.

[46]  Marco Torchiano,et al.  On the effectiveness of the test-first approach to programming , 2005, IEEE Transactions on Software Engineering.

[47]  Bertrand Meyer,et al.  A comparative study of programmer-written and automatically inferred contracts , 2009, ISSTA.

[48]  David Lo,et al.  An Empirical Study of Adoption of Software Testing in Open Source Projects , 2013, 2013 13th International Conference on Quality Software.

[49]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[50]  David Lo,et al.  A Large Scale Study of Multiple Programming Languages and Code Quality , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[51]  Choo-Yee Ting,et al.  Learning Difficulties in Programming Courses: Undergraduates' Perspective and Perception , 2009, 2009 International Conference on Computer Technology and Development.