M3: Semantic API Migrations

Library migration is a challenging problem, where most existing approaches rely on prior knowledge. This can be, for example, information derived from changelogs or statistical models of API usage. This paper addresses a different API migration scenario where there is no prior knowledge of the target library. We have no historical changelogs and no access to its internal representation. To tackle this problem, this paper proposes a novel approach (M3), where probabilistic program synthesis is used to semantically model the behavior of library functions. Then, we use an SMT-based code search engine to discover similar code in user applications. These discovered instances provide potential locations for API migrations. We evaluate our approach against 7 well-known libraries from varied application domains, learning correct implementations for 94 functions. Our approach is integrated with standard compiler tooling, and we use this integration to evaluate migration opportunities in 9 existing C/C++ applications with over 1MLoC. We discover over 7,000 instances of these functions, of which more than 2,000 represent migration opportunities.

[1]  Na Meng,et al.  Meditor: Inference and Application of API Migration Edits , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[2]  Wei Wu,et al.  AURA: a hybrid approach to identify framework evolution , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Alvin Cheung,et al.  Synthesizing highly expressive SQL queries from input-output examples , 2017, PLDI.

[4]  Zhenchang Xing,et al.  Mining Likely Analogical APIs Across Third-Party Libraries via Large-Scale Unsupervised API Semantics Embedding , 2019, IEEE Transactions on Software Engineering.

[5]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[6]  Xavier Blanc,et al.  Automatic discovery of function mappings between similar libraries , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[7]  Eleni Stroulia,et al.  API-Evolution Support with Diff-CatchUp , 2007, IEEE Transactions on Software Engineering.

[8]  Christopher D. Rosin,et al.  Stepping Stones to Inductive Synthesis of Low-Level Looping Programs , 2018, AAAI.

[9]  Marco Tulio Valente,et al.  Historical and impact analysis of API breaking changes: A large-scale study , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[10]  Isil Dillig,et al.  1 Multi-Modal Synthesis of Regular Expressions , 2019 .

[11]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[12]  Maaz Bin Safeer Ahmad,et al.  Gradual synthesis for static parallelization of single-pass array-processing programs , 2017, PLDI.

[13]  Qing Wang,et al.  Mining API mapping for language migration , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[14]  George Candea,et al.  Testing Closed-Source Binary Device Drivers with DDT , 2010, USENIX Annual Technical Conference.

[15]  Armando Solar-Lezama,et al.  Learning to Infer Program Sketches , 2019, ICML.

[16]  Daniel Kroening,et al.  Program synthesis: challenges and opportunities , 2017, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[17]  Martin P. Robillard,et al.  Recommending adaptive changes for framework evolution , 2011, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[18]  Isil Dillig,et al.  Multi-modal synthesis of regular expressions , 2019, PLDI.

[19]  Syed Sajjad Hussain Zaidi Library Migration: A Retrospective Analysis and Tool , 2019 .

[20]  Sumit Gulwani,et al.  Synthesis of loop-free programs , 2011, PLDI '11.

[21]  Danny Dig,et al.  API code recommendation using statistical learning from fine-grained changes , 2016, SIGSOFT FSE.

[22]  Miryung Kim,et al.  A graph-based approach to API usage adaptation , 2010, OOPSLA.

[23]  Armando Solar-Lezama,et al.  Program sketching , 2012, International Journal on Software Tools for Technology Transfer.

[24]  Sarfraz Khurshid,et al.  EdSynth: Synthesizing API Sequences with Conditionals and Loops , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[25]  Peter-Michael Osera,et al.  Type-and-example-directed program synthesis , 2015, PLDI.

[26]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[27]  João Pimentel,et al.  On the use of package managers by the C++ open-source community , 2018, SAC.

[28]  Ping Luo,et al.  LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code , 2020, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[29]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[30]  John Regehr,et al.  Souper: A Synthesizing Superoptimizer , 2017, ArXiv.

[31]  Armando Solar-Lezama,et al.  Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[32]  Trong Duc Nguyen,et al.  Statistical Migration of API Usages , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[33]  Alex Shaw,et al.  Automatically Fixing C Buffer Overflows Using Program Transformations , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[34]  Mira Mezini,et al.  Mining framework usage changes from instantiation code , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[35]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36]  Nghi D. Q. Bui Towards Zero Knowledge Learning for Cross Language API Mappings , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[37]  Isil Dillig,et al.  Synthesizing data structure transformations from input-output examples , 2015, PLDI.

[38]  Fabien Coelho,et al.  API compilation for image hardware accelerators , 2013, TACO.

[39]  Trong Duc Nguyen,et al.  Exploring API Embedding for API Usages and Applications , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[40]  Mira Mezini,et al.  Ieee Transactions on Software Engineering 1 Automated Api Property Inference Techniques , 2022 .

[41]  Hussein Alrubaye,et al.  Automating the detection of third-party Java library migration at the function level , 2018, CASCON.

[42]  Nadia Polikarpova,et al.  Structuring the synthesis of heap-manipulating programs , 2018, Proc. ACM Program. Lang..

[43]  Ralph E. Johnson,et al.  Automated Detection of Refactorings in Evolving Components , 2006, ECOOP.

[44]  Peter-Michael Osera Constraint-based type-directed program synthesis , 2019, TyDe@ICFP.

[45]  Yu Feng,et al.  Maximal multi-layer specification synthesis , 2019, ESEC/SIGSOFT FSE.

[46]  Cesare Tinelli,et al.  Satisfiability Modulo Theories , 2021, Handbook of Satisfiability.

[47]  Chris Lattner,et al.  LLVM: AN INFRASTRUCTURE FOR MULTI-STAGE OPTIMIZATION , 2000 .

[48]  Michael F. P. O'Boyle,et al.  Type-Directed Program Synthesis and Constraint Generation for Library Portability , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[49]  Michael F. P. O'Boyle,et al.  CAnDL: a domain specific language for compiler analysis , 2018, CC.

[50]  Louis Wasserman Scalable, example-based refactorings with refaster , 2013, WRT '13.

[51]  Alexander Spiridonov,et al.  Nobrainer: An Example-Driven Framework for C/C++ Code Transformations , 2019, Ershov Informatics Conference.

[52]  Sumit Gulwani,et al.  Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples , 2018, ICLR.

[53]  Yijun Yu,et al.  SAR: learning cross-language API mappings with little knowledge , 2019, ESEC/SIGSOFT FSE.

[54]  Michael F. P. O'Boyle,et al.  Automatic Matching of Legacy Code to Heterogeneous APIs: An Idiomatic Approach , 2018, ASPLOS.

[55]  Bruce Collie,et al.  Retrofitting Symbolic Holes to LLVM IR , 2020, ArXiv.

[56]  Rishabh Singh,et al.  Synthetic Datasets for Neural Program Synthesis , 2019, ICLR.

[57]  Armando Solar-Lezama,et al.  The Sketching Approach to Program Synthesis , 2009, APLAS.

[58]  Yves Deville,et al.  Logic Program Synthesis , 1994, J. Log. Program..

[59]  Lior Wolf,et al.  Automatic Program Synthesis of Long Programs with a Learned Garbage Collector , 2018, NeurIPS.

[60]  Laurie A. Williams,et al.  Discovering likely mappings between APIs using text mining , 2015, 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[61]  Hakjoo Oh,et al.  Synthesizing Imperative Programs from Examples Guided by Static Analysis , 2017, SAS.