Guided pattern mining for API misuse detection by change-based code analysis

Lack of experience, inadequate documentation, and sub-optimal API design frequently cause developers to make mistakes when re-using third-party implementations. Such API misuses can result in unintended behavior, performance losses, or software crashes. Therefore, current research aims to automatically detect such misuses by comparing the way a developer used an API to previously inferred patterns of the correct API usage. While research has made significant progress, these techniques have not yet been adopted in practice. In part, this is due to the lack of a process capable of seamlessly integrating with software development processes. Particularly, existing approaches do not consider how to collect relevant source code samples from which to infer patterns. In fact, an inadequate collection can cause API usage pattern miners to infer irrelevant patterns which leads to false alarms instead of finding true API misuses. In this paper, we target this problem (a) by providing a method that increases the likelihood of finding relevant and true-positive patterns concerning a given set of code changes and agnostic to a concrete static, intra-procedural mining technique and (b) by introducing a concept for just-in-time API misuse detection which analyzes changes at the time of commit. Particularly, we introduce different, lightweight code search and filtering strategies and evaluate them on two real-world API misuse datasets to determine their usefulness in finding relevant intra-procedural API usage patterns. Our main results are (1) commit-based search with subsequent filtering effectively decreases the amount of code to be analyzed, (2) in particular method-level filtering is superior to file-level filtering, (3) project-internal and project-external code search find solutions for different types of misuses and thus are complementary, (4) incorporating prior knowledge of the misused API into the search has a negligible effect.

[1]  Swarat Chaudhuri,et al.  Bayesian specification learning for finding API usage errors , 2017, ESEC/SIGSOFT FSE.

[2]  Sushil Krishna Bajracharya,et al.  CodeGenie: using test-cases to search and reuse source code , 2007, ASE '07.

[3]  RoychoudhuryAbhik,et al.  Automated program repair , 2019 .

[4]  Hong Cheng,et al.  Searching connected API subgraph via text phrases , 2012, SIGSOFT FSE.

[5]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[6]  Chanchal Kumar Roy,et al.  Useful, But Usable? Factors Affecting the Usability of APIs , 2011, 2011 18th Working Conference on Reverse Engineering.

[7]  Hridesh Rajan,et al.  Boa: A language and infrastructure for analyzing ultra-large-scale software repositories , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[8]  Qi Xin,et al.  Leveraging syntax-related code for automated program repair , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Yuming Zhou,et al.  Boosting crash-inducing change localization with rank-performance-based feature subset selection , 2020, Empirical Software Engineering.

[10]  Trong Duc Nguyen,et al.  Exploring API Embedding for API Usages and Applications , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[11]  Ali Mesbah,et al.  An Empirical Study of Client-Side JavaScript Bugs , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[12]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[13]  Mira Mezini,et al.  Ieee Transactions on Software Engineering 1 Automated Api Property Inference Techniques , 2022 .

[14]  Miltiadis Allamanis,et al.  Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering , 2014 .

[15]  David Lo,et al.  History Driven Program Repair , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[16]  Zhenchang Xing,et al.  API Method Recommendation without Worrying about the Task-API Knowledge Gap , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[17]  F. Yates Contingency Tables Involving Small Numbers and the χ2 Test , 1934 .

[18]  David Lo,et al.  Automatic recommendation of API methods from feature requests , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Benjamin Livshits,et al.  DynaMine: finding common error patterns by mining software revision histories , 2005, ESEC/FSE-13.

[20]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[21]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[22]  Zhenchang Xing,et al.  Mining Likely Analogical APIs Across Third-Party Libraries via Large-Scale Unsupervised API Semantics Embedding , 2019, IEEE Transactions on Software Engineering.

[23]  Kathryn T. Stolee,et al.  Solving the Search for Source Code , 2014, ACM Trans. Softw. Eng. Methodol..

[24]  George C. Necula,et al.  Mining Temporal Specifications for Error Detection , 2005, TACAS.

[25]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[26]  Manuvir Das,et al.  Perracotta: mining temporal API rules from imperfect traces , 2006, ICSE.

[27]  David Lo,et al.  Beyond support and confidence: Exploring interestingness measures for rule-based specification mining , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[28]  David Lo,et al.  Automated library recommendation , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[29]  Qi Xin,et al.  Better Code Search and Reuse for Better Program Repair , 2019, 2019 IEEE/ACM International Workshop on Genetic Improvement (GI).

[30]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[31]  Zhendong Su,et al.  An Empirical Study on Real Bug Fixes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[32]  Yu Zhou,et al.  Analyzing APIs Documentation and Code to Detect Directive Defects , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[33]  Mira Mezini,et al.  "Jumping Through Hoops": Why do Java Developers Struggle with Cryptography APIs? , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[34]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[35]  Robert Heumüller,et al.  Commits as a basis for API misuse detection , 2018, SoftwareMining@ASE.

[36]  Martin P. Robillard,et al.  A field study of API learning obstacles , 2011, Empirical Software Engineering.

[37]  Andreas Zeller,et al.  Mining temporal specifications from object usage , 2011, Automated Software Engineering.

[38]  R. Holmes,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[39]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[40]  Shijie Zhang,et al.  Propagating Bug Fixes with Fast Subgraph Matching , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[41]  Dongmei Zhang,et al.  CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[42]  Zhendong Su,et al.  Javert: fully automatic mining of general temporal properties from dynamic traces , 2008, SIGSOFT '08/FSE-16.

[43]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[44]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[45]  Fan Long,et al.  Automatic patch generation by learning correct code , 2016, POPL.

[46]  Atul Prakash,et al.  A Framework for Source Code Search Using Program Patterns , 1994, IEEE Trans. Software Eng..

[47]  Zhenchang Xing,et al.  What do developers search for on the web? , 2017, Empirical Software Engineering.

[48]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[49]  Tim Menzies,et al.  Is "Better Data" Better Than "Better Data Miners"? , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[50]  Kathryn T. Stolee,et al.  Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[51]  Martin White,et al.  Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[52]  Mira Mezini,et al.  A Systematic Evaluation of Static API-Misuse Detectors , 2017, IEEE Transactions on Software Engineering.

[53]  Collin McMillan,et al.  Portfolio: finding relevant functions and their usage , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[54]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[55]  Yuriy Brun,et al.  API Blindspots: Why Experienced Developers Write Vulnerable Code , 2018, SOUPS @ USENIX Security Symposium.

[56]  David Lo,et al.  AUSearch: Accurate API Usage Search in GitHub Repositories with Type Resolution , 2020, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[57]  Mark Harman,et al.  Evaluating Automatic Program Repair Capabilities to Repair API Misuses , 2021, IEEE Transactions on Software Engineering.

[58]  Kathryn T. Stolee,et al.  How developers search for code: a case study , 2015, ESEC/SIGSOFT FSE.

[59]  Rosalva E. Gallardo-Valencia,et al.  Internet-Scale Code Search , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[60]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[61]  Sven Apel,et al.  Views on Internal and External Validity in Empirical Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[62]  Jacob Krüger,et al.  Cooperative API misuse detection using correction rules , 2020, ICSE.

[63]  Chanchal Kumar Roy,et al.  RACK: Automatic API Recommendation Using Crowdsourced Knowledge , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[64]  Claire Le Goues,et al.  Automated program repair , 2019, Commun. ACM.

[65]  Miryung Kim,et al.  Discovering and representing systematic code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[66]  Gabriele Bavota,et al.  How Can I Use This Method? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[67]  Jacob Krüger,et al.  Using API-Embedding for API-Misuse Repair , 2020, ICSE.

[68]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[69]  Andreas Zeller,et al.  Detecting object usage anomalies , 2007, ESEC-FSE '07.

[70]  Ming Wen,et al.  ChangeLocator: locate crash-inducing changes based on crash reports , 2017, Empirical Software Engineering.

[71]  David Lo,et al.  Active Learning of Discriminative Subgraph Patterns for API Misuse Detection , 2022, IEEE Transactions on Software Engineering.

[72]  David Evans,et al.  Automatically inferring temporal properties for program evolution , 2004, 15th International Symposium on Software Reliability Engineering.

[73]  Xiaodong Gu,et al.  Deep Code Search , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[74]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[75]  Premkumar T. Devanbu,et al.  Recommending random walks , 2007, ESEC-FSE '07.

[76]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[77]  Sunghun Kim,et al.  Memories of bug fixes , 2006, SIGSOFT '06/FSE-14.

[78]  Mira Mezini,et al.  MUBench: A Benchmark for API-Misuse Detectors , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[79]  Mohamed Aymen Saied,et al.  Towards assisting developers in API usage by automated recovery of complex temporal patterns , 2020, Inf. Softw. Technol..

[80]  Mira Mezini,et al.  Investigating Next Steps in Static API-Misuse Detection , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[81]  Kajal T. Claypool,et al.  XSnippet: mining For sample code , 2006, OOPSLA '06.

[82]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[83]  Seung-won Hwang,et al.  Towards an Intelligent Code Search Engine , 2010, AAAI.

[84]  Cristina V. Lopes,et al.  Oreo: detection of clones in the twilight zone , 2018, ESEC/SIGSOFT FSE.

[85]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.

[86]  Claire Le Goues,et al.  Measuring Code Quality to Improve Specification Mining , 2012, IEEE Transactions on Software Engineering.

[87]  Thomas R. Gross,et al.  Automatic Generation of Object Usage Specifications from Large Method Traces , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[88]  Lin Li,et al.  Obstacles in Using Frameworks and APIs: An Exploratory Study of Programmers' Newsgroup Discussions , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[89]  Sven Amann,et al.  A Systematic Approach to Benchmark and Improve Automated Static Detection of Java-API Misuses , 2018 .

[90]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[91]  Steven P. Reiss,et al.  Semantics-based code search , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[92]  Jacob Krüger,et al.  AndroidCompass: A Dataset of Android Compatibility Checks in Code Repositories , 2021, 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR).

[93]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[94]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.