Typestate-based semantic code search over partial programs

We present a novel code search approach for answering queries focused on API-usage with code showing how the API should be used. To construct a search index, we develop new techniques for statically mining and consolidating temporal API specifications from code snippets. In contrast to existing semantic-based techniques, our approach handles partial programs in the form of code snippets. Handling snippets allows us to consume code from various sources such as parts of open source projects, educational resources (e.g. tutorials), and expert code sites. To handle code snippets, our approach (i) extracts a possibly partial temporal specification from each snippet using a relatively precise static analysis tracking a generalized notion of typestate, and (ii) consolidates the partial temporal specifications, combining consistent partial information to yield consolidated temporal specifications, each of which captures a full(er) usage scenario. To answer a search query, we define a notion of relaxed inclusion matching a query against temporal specifications and their corresponding code snippets. We have implemented our approach in a tool called PRIME and applied it to search for API usage of several challenging APIs. PRIME was able to analyze and consolidate thousands of snippets per tested API, and our results indicate that the combination of a relatively precise analysis and consolidation allowed PRIME to answer challenging queries effectively.

[1]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[2]  George C. Necula,et al.  Mining Temporal Specifications for Error Detection , 2005, TACAS.

[3]  Manuvir Das,et al.  Perracotta: mining temporal API rules from imperfect traces , 2006, ICSE.

[4]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[5]  Steven P. Reiss,et al.  Semantics-based code search , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[6]  Andreas Zeller,et al.  Mining temporal specifications from object usage , 2011, Automated Software Engineering.

[7]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[8]  Patrick Cousot,et al.  Modular Static Program Analysis , 2002, CC.

[9]  Eric Bodden,et al.  Effective API navigation and reuse , 2010, 2010 IEEE International Conference on Information Reuse & Integration.

[10]  Jonathan Aldrich,et al.  An Empirical Study of Object Protocols in the Wild , 2011, ECOOP.

[11]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[12]  Armando Solar-Lezama,et al.  Programming by sketching for bit-streaming programs , 2005, PLDI '05.

[13]  R. Holmes,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[14]  Andreas Zeller,et al.  Learning from 6,000 projects: lightweight cross-project anomaly detection , 2010, ISSTA '10.

[15]  Zhendong Su,et al.  Javert: fully automatic mining of general temporal properties from dynamic traces , 2008, SIGSOFT '08/FSE-16.

[16]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[17]  Robert E. Strom,et al.  Typestate: A programming language concept for enhancing software reliability , 1986, IEEE Transactions on Software Engineering.

[18]  Andreas Zeller,et al.  Mining object behavior with ADABU , 2006, WODA '06.

[19]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[20]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[21]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[22]  Eran Yahav,et al.  Effective typestate verification in the presence of aliasing , 2006, TSEM.

[23]  Siau-Cheng Khoo,et al.  SMArTIC: towards building an accurate, robust and scalable specification miner , 2006, SIGSOFT '06/FSE-14.

[24]  Robert J. Walker,et al.  Strathcona example recommendation tool , 2005, ESEC/FSE-13.

[25]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[26]  Monica S. Lam,et al.  Automatic extraction of object-oriented component interfaces , 2002, ISSTA '02.

[27]  Seung-won Hwang,et al.  Towards an Intelligent Code Search Engine , 2010, AAAI.

[28]  Katsuro Inoue,et al.  Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder , 2007, 29th International Conference on Software Engineering (ICSE'07).

[29]  Dietmar Seipel,et al.  Clone detection in source code by frequent itemset techniques , 2004 .

[30]  Laurie J. Hendren,et al.  Enabling static analysis for partial java programs , 2008, OOPSLA.

[31]  Pavol Cerný,et al.  Synthesis of interface specifications for Java classes , 2005, POPL '05.

[32]  Jürgen Wolff von Gudenberg,et al.  Clone detection in source code by frequent itemset techniques , 2004, Source Code Analysis and Manipulation, Fourth IEEE International Workshop on.

[33]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[34]  Mira Mezini,et al.  Detecting Missing Method Calls in Object-Oriented Software , 2010, ECOOP.

[35]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[36]  Kajal T. Claypool,et al.  XSnippet: mining For sample code , 2006, OOPSLA '06.

[37]  Jian Pei,et al.  Mining API patterns as partial orders from source code: from usage scenarios to specifications , 2007, ESEC-FSE '07.

[38]  Eran Yahav,et al.  Static Specification Mining Using Automata-Based Abstractions , 2007, IEEE Transactions on Software Engineering.

[39]  Andreas Zeller,et al.  Mining temporal specifications from object usage , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[40]  Andreas Zeller,et al.  Detecting object usage anomalies , 2007, ESEC-FSE '07.

[41]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.