A Large-scale Study on API Misuses in the Wild

API misuses are prevalent and extremely harmful. Despite various techniques have been proposed for API-misuse detection, it is not even clear how different types of API misuses distribute and whether existing techniques have covered all major types of API misuses. Therefore, in this paper, we conduct the first large-scale empirical study on API misuses based on 528,546 historical bug-fixing commits from GitHub (from 2011 to 2018). By leveraging a state-of-the-art fine-grained AST differencing tool, GumTree, we extract more than one million bug-fixing edit operations, 51.7% of which are API misuses. We further systematically classify API misuses into nine different categories according to the edit operations and context. We also extract various frequent API-misuse patterns based on the categories and corresponding operations, which can be complementary to existing API-misuse detection tools. Our study reveals various practical guidelines regarding the importance of different types of API misuses. Furthermore, based on our dataset, we perform a user study to manually analyze the usage constraints of 10 patterns to explore whether the mined patterns can guide the design of future API-misuse detection tools. Specifically, we find that 7,541 potential misuses still exist in latest Apache projects and 149 of them have been reported to developers. To date, 57 have already been confirmed and fixed (with 15 rejected misuses correspondingly). The results indicate the importance of studying historical API misuses and the promising future of employing our mined patterns for detecting unknown API misuses.

[1]  Matias Martinez,et al.  Mining software repair models for reasoning on the search space of automated program fixing , 2013, Empirical Software Engineering.

[2]  Matias Martinez,et al.  Coming: A Tool for Mining Change Pattern Instances from Git Commits , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[3]  Chadd C. Williams,et al.  Automatic mining of source code repositories to improve bug finding techniques , 2005, IEEE Transactions on Software Engineering.

[4]  Matias Martinez,et al.  Fine-grained and accurate source code differencing , 2014, ASE.

[5]  Premkumar T. Devanbu,et al.  On the "naturalness" of buggy code , 2015, ICSE.

[6]  Zhendong Su,et al.  An Empirical Study on Real Bug Fixes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[7]  Tung Thanh Nguyen,et al.  Recommending API Usages for Mobile Apps with Hidden Markov Model , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Nikolai Tillmann,et al.  Test generation via Dynamic Symbolic Execution for mutation testing , 2010, 2010 IEEE International Conference on Software Maintenance.

[9]  Andreas Zeller,et al.  Detecting object usage anomalies , 2007, ESEC-FSE '07.

[10]  Suresh Jagannathan,et al.  Path-Sensitive Inference of Function Precedence Protocols , 2007, 29th International Conference on Software Engineering (ICSE'07).

[11]  Xia Li,et al.  Can automated program repair refine fault localization? a unified debugging approach , 2020, ISSTA.

[12]  Hongyu Zhang,et al.  Shaping program repair space with existing patches and similar code , 2018, ISSTA.

[13]  Sarfraz Khurshid,et al.  Injecting mechanical faults to localize developer faults for evolving software , 2013, OOPSLA.

[14]  Tao Xie,et al.  Alattin: Mining Alternative Patterns for Detecting Neglected Conditions , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[15]  Na Meng,et al.  Secure Coding Practices in Java: Challenges and Vulnerabilities , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[16]  Mary Lou Soffa,et al.  Efficient computation of interprocedural definition-use chains , 1994, TOPL.

[17]  Tao Xie,et al.  Inferring project-specific bug patterns for detecting sibling bugs , 2013, ESEC/FSE 2013.

[18]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[19]  Xia Li,et al.  Transforming programs and tests in tandem for fault localization , 2017, Proc. ACM Program. Lang..

[20]  Yves Le Traon,et al.  Metallaxis‐FL: mutation‐based fault localization , 2015, Softw. Test. Verification Reliab..

[21]  Mira Mezini,et al.  Detecting missing method calls as violations of the majority rule , 2013, TSEM.

[22]  Ming Wen,et al.  Exposing Library API Misuses Via Mutation Analysis , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[23]  Andreas Zeller,et al.  Mining temporal specifications from object usage , 2011, Automated Software Engineering.

[24]  Gabriele Bavota,et al.  An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[26]  Christian Lindig Mining Patterns and Violations Using Concept Analysis , 2015, The Art and Science of Analyzing Software Data.

[27]  David Hovemeyer,et al.  Finding more null pointer bugs, but not too many , 2007, PASTE '07.

[28]  Mira Mezini,et al.  A Systematic Evaluation of Static API-Misuse Detectors , 2017, IEEE Transactions on Software Engineering.

[29]  Barbara G. Ryder,et al.  Parameterized object sensitivity for points-to analysis for Java , 2005, TSEM.

[30]  David Lo,et al.  History Driven Program Repair , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[31]  Marcelo de Almeida Maia,et al.  Towards an automated approach for bug fix pattern detection , 2018, ArXiv.

[32]  Benjamin Livshits,et al.  DynaMine: finding common error patterns by mining software revision histories , 2005, ESEC/FSE-13.

[33]  Sunghun Kim,et al.  Toward an understanding of bug fix patterns , 2009, Empirical Software Engineering.

[34]  Shin Yoo,et al.  Mining Fix Patterns for FindBugs Violations , 2017, IEEE Transactions on Software Engineering.

[35]  Mira Mezini,et al.  Detecting Missing Method Calls in Object-Oriented Software , 2010, ECOOP.

[36]  Wei Li,et al.  DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization , 2019, ISSTA.

[37]  Ali Mesbah,et al.  Discovering bug patterns in JavaScript , 2016, SIGSOFT FSE.

[38]  Hoan Anh Nguyen,et al.  Recurring bug fixes in object-oriented programs , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[39]  Hridesh Rajan,et al.  A study of repetitiveness of code changes in software evolution , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[40]  Lingming Zhang,et al.  Practical program repair via bytecode mutation , 2018, ISSTA.

[41]  Tao Xie,et al.  Mining exception-handling rules as sequence association rules , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[42]  Marcelo de Almeida Maia,et al.  Dissection of a bug dataset: Anatomy of 395 patches from Defects4J , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[43]  Tao Xie,et al.  Mining API Error-Handling Specifications from Source Code , 2009, FASE.

[44]  Sarfraz Khurshid,et al.  Localizing failure-inducing program edits based on spectrum information , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[45]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[46]  Yingfei Xiong,et al.  Inferring Program Transformations From Singular Examples via Big Code , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[47]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[48]  Ákos Hajnal,et al.  A precise demand-driven definition-use chaining algorithm , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[49]  Na Meng,et al.  A Characterization Study of Repeated Bug Fixes , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[50]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[51]  David Lo,et al.  Bug Characteristics in Blockchain Systems: A Large-Scale Empirical Study , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[52]  Suresh Jagannathan,et al.  Static specification inference using predicate mining , 2007, PLDI '07.

[53]  Robert Heumüller,et al.  Commits as a basis for API misuse detection , 2018, SoftwareMining@ASE.

[54]  David Lo,et al.  A Deeper Look into Bug Fixes: Patterns, Replacements, Deletions, and Additions , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[55]  Mira Mezini,et al.  MUBench: A Benchmark for API-Misuse Detectors , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).