Enhancing example-based code search with functional semantics

Abstract As the quality and quantity of open source code increase, effective and efficient search for code implementing certain semantics, or semantics-based code search, has become an emerging need for software developers to retrieve and reuse existing source code. Previous techniques in semantics-based code search encode the semantics of loop-free Java code snippets as constraints and utilize an SMT solver to find encoded snippets that match an input/output (IO) query. We present in this article the Quebio approach to semantics-based search for Java methods. Quebio advances the state-of-the-art by supporting important language features like invocation to library APIs and enabling the search to handle more data types like array/List, Set, and Map. Compared with existing approaches, Quebio also integrates a customized keyword-based search that uses as the input a textual, behavioral summary of the desired methods to quickly prune the methods to be checked against the IO examples. To evaluate the effectiveness and efficiency of Quebio, we constructed a repository of 14,792 methods from 723 open source Java projects hosted on GitHub and applied the approach to resolve 47 queries extracted from StackOverflow. Quebio was able to find methods correctly implementing the specified IO behaviors for 43 of the queries, significantly outperforming the existing semantics-based code search techniques. The average search time with Quebio was 213.2 seconds for each query.

[1]  Myra B. Cohen,et al.  An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[2]  Kathryn T. Stolee,et al.  How developers search for code: a case study , 2015, ESEC/SIGSOFT FSE.

[3]  Lori A. Clarke,et al.  A System to Generate Test Data and Symbolically Execute Programs , 1976, IEEE Transactions on Software Engineering.

[4]  Lee Martie,et al.  Understanding the impact of support for iteration on code search , 2017, ESEC/SIGSOFT FSE.

[5]  Sushil Krishna Bajracharya,et al.  CodeGenie: using test-cases to search and reuse source code , 2007, ASE '07.

[6]  Michael D. Ernst,et al.  An overview of JML tools and applications , 2003, International Journal on Software Tools for Technology Transfer.

[7]  Trong Duc Nguyen,et al.  Complementing global and local contexts in representing API descriptions to improve API retrieval tasks , 2018, ESEC/SIGSOFT FSE.

[8]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[9]  Gail E. Kaiser,et al.  Code relatives: detecting similarly behaving software , 2016, SIGSOFT FSE.

[10]  Kathryn T. Stolee,et al.  Solving the Search for Source Code , 2014, ACM Trans. Softw. Eng. Methodol..

[11]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .

[12]  Matthew B. Dwyer,et al.  Code search with input/output queries: Generalizing, ranking, and assessment , 2016, J. Syst. Softw..

[13]  Xiangyu Zhang,et al.  Automatic Model Generation from Documentation for Java API Functions , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[14]  Sumit Gulwani,et al.  FlashMeta: a framework for inductive program synthesis , 2015, OOPSLA.

[15]  Anita Sarma,et al.  ANNE: Improving Source Code Search using Entity Retrieval Approach , 2017, WSDM.

[16]  Nikolai Tillmann,et al.  DySy: dynamic symbolic execution for invariant inference , 2008, ICSE.

[17]  Michael R. Lowry,et al.  Combining unit-level symbolic execution and system-level concrete execution for testing nasa software , 2008, ISSTA '08.

[18]  Jacques Klein,et al.  FaCoY – A Code-to-Code Search Engine , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[19]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[20]  Xiaodong Gu,et al.  Deep Code Search , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[21]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[22]  Ying Zou,et al.  Expanding Queries for Code Search Using Semantically Related API Class-names , 2018, IEEE Transactions on Software Engineering.

[23]  Junfeng Zhao,et al.  Improving software text retrieval using conceptual knowledge in source code , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24]  Sushil Krishna Bajracharya,et al.  Sourcerer: a search engine for open source code supporting structure-based search , 2006, OOPSLA '06.

[25]  Steven P. Reiss,et al.  Semantics-based code search , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[26]  Clark W. Barrett,et al.  The SMT-LIB Standard Version 2.0 , 2010 .

[27]  Gang Zhao,et al.  DeepSim: deep learning code functional similarity , 2018, ESEC/SIGSOFT FSE.

[28]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[29]  Mukund Raghothaman,et al.  SWIM: Synthesizing What I Mean - Code Search and Idiomatic Snippet Synthesis , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[30]  Cristina V. Lopes,et al.  How Well Do Search Engines Support Code Retrieval on the Web? , 2011, TSEM.

[31]  Guy Van den Broeck,et al.  Active Inductive Logic Programming for Code Search , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[32]  Collin McMillan,et al.  Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications , 2012, IEEE Transactions on Software Engineering.

[33]  Myra B. Cohen,et al.  An Orchestrated Survey on Automated Software Test Case Generation I , 2013 .

[34]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[35]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[36]  Gary T. Leavens,et al.  Beyond Assertions: Advanced Specification and Verification with JML and ESC/Java2 , 2005, FMCO.

[37]  Xuan Li,et al.  Relationship-aware code search for JavaScript frameworks , 2016, SIGSOFT FSE.