Statistical learning approach for mining API usage mappings for code migration

The same software product nowadays could appear in multiple platforms and devices. To address business needs, software companies develop a software product in a programming language and then migrate it to another one. To support that process, semi-automatic migration tools have been proposed. However, they require users to manually define the mappings between the respective APIs of the libraries used in two languages. To reduce such manual effort, we introduce StaMiner, a novel data-driven approach that statistically learns the mappings between APIs from the corpus of the corresponding client code of the APIs in two languages Java and C#. Instead of using heuristics on the textual or structural similarity between APIs in two languages to map API methods and classes as in existing mining approaches, StaMiner is based on a statistical model that learns the mappings in such a corpus and provides mappings for APIs with all possible arities. Our empirical evaluation on several projects shows that StaMiner can detect API usage mappings with higher accuracy than a state-of-the-art approach. With the resulting API mappings mined by StaMiner, Java2CSharp, an existing migration tool, could achieve a higher level of accuracy.

[1]  Richard C. Waters Program Translation via Abstraction and Reimplementation , 1988, IEEE Trans. Software Eng..

[2]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3]  Norihisa Doi,et al.  SPiCE: A System for Translating Smalltalk Programs Into a C Environment , 1995, IEEE Trans. Software Eng..

[4]  David Notkin,et al.  Semi-automatic update of applications in response to library changes , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[5]  Arie van Deursen,et al.  Identifying objects using cluster and concept analysis , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[6]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[7]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[8]  Maxim Mossienko Automated Cobol to Java recycling , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[9]  R. Holmes,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[10]  Richard C. Holt,et al.  A lightweight approach for migrating web frameworks , 2005, Inf. Softw. Technol..

[11]  J. Henkel,et al.  CatchUp! Capturing and replaying refactorings to support API evolution , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[12]  Mohammad El-Ramly,et al.  An Experiment in Automatic Conversion of Legacy Java Programs to C# , 2006, IEEE International Conference on Computer Systems and Applications, 2006..

[13]  Julia L. Lawall,et al.  SmPL: A Domain-Specific Language for Specifying Collateral Evolutions in Linux Device Drivers , 2006, Electron. Notes Theor. Comput. Sci..

[14]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[15]  Eleni Stroulia,et al.  API-Evolution Support with Diff-CatchUp , 2007, IEEE Transactions on Software Engineering.

[16]  Martin P. Robillard,et al.  Recommending adaptive changes for framework evolution , 2011, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[17]  Eli Tilevich,et al.  Annotation refactoring: inferring upgrade transformations for legacy applications , 2008, OOPSLA.

[18]  Julia L. Lawall,et al.  Generic patch inference , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[19]  Julia L. Lawall,et al.  Enforcing the use of API functions in linux code , 2009, ACP4IS '09.

[20]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[21]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.

[22]  David Notkin,et al.  Using twinning to adapt programs to alternative APIs , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[23]  Christopher D. Manning,et al.  Phrasal: a toolkit for statistical machine translation with facilities for extraction and incorporation of arbitrary model features , 2010, HLT-NAACL 2010.

[24]  Wei Wu,et al.  AURA: a hybrid approach to identify framework evolution , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[25]  Qing Wang,et al.  Mining API mapping for language migration , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[26]  Harry M. Sneed,et al.  Migrating from COBOL to Java , 2010, 2010 IEEE International Conference on Software Maintenance.

[27]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[28]  Bertrand Meyer,et al.  C to O-O Translation: Beyond the Easy Stuff , 2012, 2012 19th Working Conference on Reverse Engineering.

[29]  Lu Zhang,et al.  A history-based matching approach to identification of framework evolution , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[30]  Anh Tuan Nguyen,et al.  Lexical statistical machine translation for language migration , 2013, ESEC/FSE 2013.

[31]  Anh Tuan Nguyen,et al.  Migrating code with statistical machine translation , 2014, ICSE Companion.

[32]  Anh Tuan Nguyen,et al.  Statistical learning of API mappings for language migration , 2014, ICSE Companion.