Discovering Math APIs by Mining Unit Tests

In today's API-rich world, programmer productivity depends heavily on the programmer's ability to discover the required APIs. In this paper, we present a technique and tool, called MathFinder, to discover APIs for mathematical computations by mining unit tests of API methods. Given a math expression, MathFinder synthesizes pseudo-code to compute the expression by mapping its subexpressions to API method calls. For each subexpression, MathFinder searches for a method such that there is a mapping between method inputs and variables of the subexpression. The subexpression, when evaluated on the test inputs of the method under this mapping, should produce results that match the method output on a large number of tests. We implemented MathFinder as an Eclipse plugin for discovery of third-party Java APIs and performed a user study to evaluate its effectiveness. In the study, the use of MathFinder resulted in a 2x improvement in programmer productivity. In 96% of the subexpressions queried for in the study, MathFinder retrieved the desired API methods as the top-most result. The top-most pseudo-code snippet to implement the entire expression was correct in 93% of the cases. Since the number of methods and unit tests to mine could be large in practice, we also implement MathFinder in a MapReduce framework and evaluate its scalability and response time.

[1]  Albert L. Baker,et al.  JML: A Notation for Detailed Design , 1999, Behavioral Specifications of Businesses and Systems.

[2]  Colin Atkinson,et al.  Code Conjurer: Pulling Reusable Software out of Thin Air , 2008, IEEE Software.

[3]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[4]  Andrew D. Gordon,et al.  Maintaining Database Integrity with Refinement Types , 2011, ECOOP.

[5]  Gerhard Fischer,et al.  Supporting reuse by delivering task-relevant and personalized information , 2002, ICSE '02.

[6]  Gail C. Murphy,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[7]  Solar-LezamaArmando,et al.  Data-driven synthesis for object-oriented frameworks , 2011 .

[8]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[9]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[10]  Koushik Sen,et al.  SNIFF: A Search Engine for Java Using Free-Form Queries , 2009, FASE.

[11]  Rob Miller,et al.  Keyword programming in java , 2007, ASE '07.

[12]  Eran Yahav,et al.  Typestate-based semantic code search over partial programs , 2012, OOPSLA '12.

[13]  M. Pdttri Retrieving library identifiers via equational matching of types , 1990 .

[14]  Daniel Boley,et al.  Numerical Methods for Linear Control Systems , 1994 .

[15]  Martin P. Robillard,et al.  Using Structure-Based Recommendations to Facilitate Discoverability in APIs , 2011, ECOOP.

[16]  Perdita Stevens,et al.  Modelling Recursive Calls with UML State Diagrams , 2003, FASE.

[17]  Andy Podgurski,et al.  Behavior sampling: a technique for automated retrieval of reusable components , 1992, International Conference on Software Engineering.

[18]  Sushil Krishna Bajracharya,et al.  A test-driven approach to code search and its application to the reuse of auxiliary functionality , 2011, Inf. Softw. Technol..

[19]  Jeannette M. Wing,et al.  Signature matching: a key to reuse , 1993, SIGSOFT '93.

[20]  Sumit Gulwani,et al.  Type-directed completion of partial expressions , 2012, PLDI.

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[22]  Kajal T. Claypool,et al.  XSnippet: mining For sample code , 2006, OOPSLA '06.

[23]  Robert J. Hall,et al.  Generalized behavior-based retrieval , 1993, ICSE '93.

[24]  Frank van Harmelen,et al.  Extensions to the Rippling-Out Tactic for Guiding Inductive Proofs , 1990, CADE.

[25]  Andrew Begel Codifier: A Programmer-Centric Search User Interface , 2008 .

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Steven P. Reiss,et al.  Semantics-based code search , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[28]  Armando Solar-Lezama,et al.  Data-driven synthesis for object-oriented frameworks , 2011, OOPSLA '11.

[29]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[30]  Ruzica Piskac,et al.  Interactive Synthesis of Code Snippets , 2011, CAV.

[31]  Jeannette M. Wing,et al.  Specification matching of software components , 1997 .

[32]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[33]  James Fogarty,et al.  Assieme: finding and leveraging implicit references in a web search interface for programmers , 2007, UIST '07.