SWIM: Synthesizing What I Mean - Code Search and Idiomatic Snippet Synthesis

Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code". We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce \emph{structured call sequences} to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis. We evaluated SWIM with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.

[1]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[2]  Viktor Kuncak,et al.  Interactive Synthesis Using Free-Form Queries , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[3]  Ruzica Piskac,et al.  Interactive Synthesis of Code Snippets , 2011, CAV.

[4]  Ruzica Piskac,et al.  Complete completion using types and weights , 2013, PLDI.

[5]  Pavol Cerný,et al.  Synthesis of interface specifications for Java classes , 2005, POPL '05.

[6]  Gail E. Kaiser,et al.  Intelligent assistance for software development and maintenance , 1988, IEEE Software.

[7]  Mira Mezini,et al.  Detecting missing method calls as violations of the majority rule , 2013, TSEM.

[8]  Sumit Gulwani,et al.  Type-directed completion of partial expressions , 2012, PLDI.

[9]  Sumit Gulwani,et al.  SmartSynth: synthesizing smartphone automation scripts from natural language , 2013, MobiSys '13.

[10]  Andreas Zeller,et al.  Learning from 6,000 projects: lightweight cross-project anomaly detection , 2010, ISSTA '10.

[11]  Anh Tuan Nguyen,et al.  Graph-Based Statistical Language Model for Code , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[12]  YahavEran,et al.  Code completion with statistical language models , 2014 .

[13]  Robert E. Strom,et al.  Typestate: A programming language concept for enhancing software reliability , 1986, IEEE Transactions on Software Engineering.

[14]  Ying Zou,et al.  Spotting working code examples , 2014, ICSE.

[15]  Sumit Gulwani,et al.  Building Bing Developer Assistant , 2015 .

[16]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[17]  Jianfeng Gao,et al.  Towards Concept-Based Translation Models Using Search Logs for Query Expansion , 2012, Proceedings of the 21st ACM international conference on Information and knowledge management.

[18]  Koushik Sen,et al.  SNIFF: A Search Engine for Java Using Free-Form Queries , 2009, FASE.

[19]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[20]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[21]  Charles A. Sutton,et al.  Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[22]  Pawel Urzyczyn,et al.  Inhabitation in Typed Lambda-Calculi (A Syntactic Approach) , 1997, TLCA.

[23]  Eran Yahav,et al.  Typestate-based semantic code search over partial programs , 2012, OOPSLA '12.

[24]  PietraVincent J. Della,et al.  The mathematics of statistical machine translation , 1993 .

[25]  Andrew D. Gordon,et al.  Bimodal Modelling of Source Code and Natural Language , 2015, ICML.

[26]  References , 1971 .

[27]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[28]  Koushik Sen,et al.  CodeHint: dynamic and interactive synthesis of code snippets , 2014, ICSE.