Mining Unit Tests for Discovery and Migration of Math APIs

Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance. The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MathFinder, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MathFinder synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version. We perform extensive evaluation of MathFinder (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MathFinder on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MathFinder for API discovery, the programmers who used MathFinder finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MathFinder to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MathFinder is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.

[1]  Jeannette M. Wing,et al.  Specification matching of software components , 1995, TSEM.

[2]  Qing Wang,et al.  Mining API mapping for language migration , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Andrew Begel Codifier: A Programmer-Centric Search User Interface , 2008 .

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Brad A. Myers,et al.  The implications of method placement on API learnability , 2008, SIGSOFT '08/FSE-16.

[6]  David Notkin,et al.  Using twinning to adapt programs to alternative APIs , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[7]  Jack J. Dongarra,et al.  Basic Linear Algebra Subprograms Technical (Blast) Forum Standard (1) , 2002, Int. J. High Perform. Comput. Appl..

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Mira Mezini,et al.  Mining framework usage changes from instantiation code , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[10]  Jeannette M. Wing,et al.  Signature matching: a key to reuse , 1993, SIGSOFT '93.

[11]  Ralf,et al.  Swing to SWT and back: Patterns for API migration by wrapping , 2010, ICSM 2010.

[12]  Sushil Krishna Bajracharya,et al.  A test-driven approach to code search and its application to the reuse of auxiliary functionality , 2011, Inf. Softw. Technol..

[13]  Robert J. Walker,et al.  Systematizing pragmatic software reuse , 2012, TSEM.

[14]  Robert J. Walker,et al.  Refactoring references for library migration , 2010, OOPSLA.

[15]  Jack Dongarra,et al.  Preface: Basic Linear Algebra Subprograms Technical (Blast) Forum Standard , 2002 .

[16]  Ruzica Piskac,et al.  Interactive Synthesis of Code Snippets , 2011, CAV.

[17]  Kajal T. Claypool,et al.  XSnippet: mining For sample code , 2006, OOPSLA '06.

[18]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[19]  Giovanni Denaro,et al.  ACM Transactions on Software Engineering and Methodology : Volume 22, Nomor 4, 2013 , 2014 .

[20]  Martin P. Robillard,et al.  Detecting inefficient API usage , 2009, 2009 31st International Conference on Software Engineering - Companion Volume.

[21]  R. Holmes,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[22]  Jeff H. Perkins Automatically generating refactorings to support API evolution , 2005, PASTE '05.

[23]  Mira Mezini,et al.  Ieee Transactions on Software Engineering 1 Automated Api Property Inference Techniques , 2022 .

[24]  Robert J. Walker,et al.  Seeking the ground truth: a retroactive study on the evolution and migration of software libraries , 2012, SIGSOFT FSE.

[25]  Mikael Rittri Retrieving Library Identifiers via Equational Matching of Types , 1990, CADE.

[26]  Daniel Boley,et al.  Numerical Methods for Linear Control Systems , 1994 .

[27]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[28]  James Fogarty,et al.  Assieme: finding and leveraging implicit references in a web search interface for programmers , 2007, UIST '07.

[29]  Robert J. Hall,et al.  Generalized behavior-based retrieval , 1993, ICSE '93.

[30]  Ralph E. Johnson,et al.  How do APIs evolve? A story of refactoring , 2006 .

[31]  Lu Zhang,et al.  A history-based matching approach to identification of framework evolution , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[32]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[33]  Martin P. Robillard,et al.  SemDiff: Analysis and recommendation support for API evolution , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[34]  Wei Wu,et al.  AURA: a hybrid approach to identify framework evolution , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[35]  Stephan Diehl,et al.  Identifying Refactorings from Source-Code Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[36]  Albert L. Baker,et al.  JML: A Notation for Detailed Design , 1999, Behavioral Specifications of Businesses and Systems.

[37]  Ralph E. Johnson,et al.  Automated Detection of Refactorings in Evolving Components , 2006, ECOOP.

[38]  Andy Podgurski,et al.  Behavior sampling: a technique for automated retrieval of reusable components , 1992, International Conference on Software Engineering.

[39]  Eran Yahav,et al.  Typestate-based semantic code search over partial programs , 2012, OOPSLA '12.

[40]  Armando Solar-Lezama,et al.  Data-driven synthesis for object-oriented frameworks , 2011, OOPSLA '11.

[41]  Ralf Lämmel,et al.  Swing to SWT and back: Patterns for API migration by wrapping , 2010, 2010 IEEE International Conference on Software Maintenance.

[42]  J. Henkel,et al.  CatchUp! Capturing and replaying refactorings to support API evolution , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[43]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[44]  Xavier Blanc,et al.  Mining Library Migration Graphs , 2012, 2012 19th Working Conference on Reverse Engineering.

[45]  Rob Miller,et al.  Keyword programming in Java , 2008, Automated Software Engineering.

[46]  K. Yuen Bayesian Methods for Structural Dynamics and Civil Engineering , 2010 .

[47]  Sumit Gulwani,et al.  Type-directed completion of partial expressions , 2012, PLDI.

[48]  Aditya Kanade,et al.  Discovering Math APIs by Mining Unit Tests , 2013, FASE.

[49]  Ralph Johnson,et al.  How do APIs evolveq A story of refactoring: Research Articles , 2006 .

[50]  Colin Atkinson,et al.  Code Conjurer: Pulling Reusable Software out of Thin Air , 2008, IEEE Software.

[51]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[52]  Martin P. Robillard,et al.  Improving API Usage through Automatic Detection of Redundant Code , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[53]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[54]  Koushik Sen,et al.  SNIFF: A Search Engine for Java Using Free-Form Queries , 2009, FASE.

[55]  Martin P. Robillard,et al.  Using Structure-Based Recommendations to Facilitate Discoverability in APIs , 2011, ECOOP.

[56]  Miryung Kim,et al.  A graph-based approach to API usage adaptation , 2010, OOPSLA.

[57]  Ralf Lämmel,et al.  Study of an API Migration for Two XML APIs , 2009, SLE.

[58]  Yogesh Padmanaban,et al.  Inferring likely mappings between APIs , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[59]  Steven P. Reiss,et al.  Semantics-based code search , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[60]  Gerhard Fischer,et al.  Supporting reuse by delivering task-relevant and personalized information , 2002, ICSE '02.

[61]  Frank Tip,et al.  Refactoring support for class library migration , 2005, OOPSLA '05.

[62]  Collin McMillan,et al.  Portfolio: finding relevant functions and their usage , 2011, 2011 33rd International Conference on Software Engineering (ICSE).