Enhance code search via reformulating queries with evolving contexts

To improve code search, many query expansion (QE) approaches use APIs or crowd knowledge for expanding a query. However, these approaches may sometimes negatively impact the retrieval performance. This is because they can’t distinguish the relevant terms from the irrelevant ones among a large set of candidate expansion terms and expand a query with irrelevant terms. In this paper, we propose QREC, a query reformulation approach with evolving contexts that refer to new/deleted terms and dependent terms during the code evolution. By considering the new terms as the relevant and the deleted terms as the irrelevant, QREC could reformulate a query with appropriate expansion terms. The experimental results show that QREC outperforms the state-of-the-art QE approaches (e.g., CodeHow and QECK) by 9–11% and improves the precision of the code search algorithms IR, Portfolio and VF by up to 37–45%.

[1]  Xiao Ma,et al.  From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[2]  Emily Hill,et al.  Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[3]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[4]  Edward A. Fox,et al.  Research Contributions , 2014 .

[5]  Lori L. Pollock,et al.  Automatically mining software-based, semantically-similar words from comment-code mappings , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[6]  Kathryn T. Stolee,et al.  How developers search for code: a case study , 2015, ESEC/SIGSOFT FSE.

[7]  Ying Zou,et al.  Spotting working code examples , 2014, ICSE.

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Collin McMillan,et al.  Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications , 2012, IEEE Transactions on Software Engineering.

[10]  David Notkin,et al.  Editorial—looking back , 2013, TSEM.

[11]  Andrian Marcus,et al.  Using Observed Behavior to Reformulate Queries during Text Retrieval-based Bug Localization , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[12]  Hongfei Lin,et al.  Assessment of learning to rank methods for query expansion , 2016, J. Assoc. Inf. Sci. Technol..

[13]  Sushil Krishna Bajracharya,et al.  CodeGenie: using test-cases to search and reuse source code , 2007, ASE '07.

[14]  Junwu Zhu,et al.  Empirical studies on the NLP techniques for source code data preprocessing , 2014, EAST 2014.

[15]  Charles L. A. Clarke,et al.  Archetypal source code searches: a survey of software developers and maintainers , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[16]  Gerhard Fischer,et al.  Cognitive tools for locating and comprehending software objects for reuse , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[17]  P. K. Lawlis,et al.  Automated software engineering planning with SASEA , 1998 .

[18]  Xiaochen Li,et al.  Query Expansion Based on Crowd Knowledge for Code Search , 2016, IEEE Transactions on Services Computing.

[19]  Kathryn T. Stolee,et al.  Solving the Search for Source Code , 2014, ACM Trans. Softw. Eng. Methodol..

[20]  David Lo,et al.  SEWordSim: software-specific word similarity database , 2014, ICSE Companion.

[21]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[22]  Eunseok Lee,et al.  Improved bug localization based on code change histories and bug reports , 2017, Inf. Softw. Technol..

[23]  Collin McMillan,et al.  Portfolio: Searching for relevant functions and their usages in millions of lines of code , 2013, TSEM.

[24]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[25]  Mira Mezini,et al.  Evaluating the evaluations of code recommender systems: A reality check , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Dongmei Zhang,et al.  CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[27]  Danny Dig,et al.  API code recommendation using statistical learning from fine-grained changes , 2016, SIGSOFT FSE.