论文信息 - Enhance code search via reformulating queries with evolving contexts

Enhance code search via reformulating queries with evolving contexts

To improve code search, many query expansion (QE) approaches use APIs or crowd knowledge for expanding a query. However, these approaches may sometimes negatively impact the retrieval performance. This is because they can’t distinguish the relevant terms from the irrelevant ones among a large set of candidate expansion terms and expand a query with irrelevant terms. In this paper, we propose QREC, a query reformulation approach with evolving contexts that refer to new/deleted terms and dependent terms during the code evolution. By considering the new terms as the relevant and the deleted terms as the irrelevant, QREC could reformulate a query with appropriate expansion terms. The experimental results show that QREC outperforms the state-of-the-art QE approaches (e.g., CodeHow and QECK) by 9–11% and improves the precision of the code search algorithms IR, Portfolio and VF by up to 37–45%.

Qing Huang | Guoqing Wu | Guoqing Wu | Qing Huang

[1] Xiao Ma,et al. From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[2] Emily Hill,et al. Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[3] Gabriele Bavota,et al. Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[4] Edward A. Fox,et al. Research Contributions , 2014 .

[5] Lori L. Pollock,et al. Automatically mining software-based, semantically-similar words from comment-code mappings , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[6] Kathryn T. Stolee,et al. How developers search for code: a case study , 2015, ESEC/SIGSOFT FSE.

[7] Ying Zou,et al. Spotting working code examples , 2014, ICSE.

[8] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[9] Collin McMillan,et al. Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications , 2012, IEEE Transactions on Software Engineering.

[10] David Notkin,et al. Editorial—looking back , 2013, TSEM.

[11] Andrian Marcus,et al. Using Observed Behavior to Reformulate Queries during Text Retrieval-based Bug Localization , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[12] Hongfei Lin,et al. Assessment of learning to rank methods for query expansion , 2016, J. Assoc. Inf. Sci. Technol..

[13] Sushil Krishna Bajracharya,et al. CodeGenie: using test-cases to search and reuse source code , 2007, ASE '07.

[14] Junwu Zhu,et al. Empirical studies on the NLP techniques for source code data preprocessing , 2014, EAST 2014.

[15] Charles L. A. Clarke,et al. Archetypal source code searches: a survey of software developers and maintainers , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[16] Gerhard Fischer,et al. Cognitive tools for locating and comprehending software objects for reuse , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[17] P. K. Lawlis,et al. Automated software engineering planning with SASEA , 1998 .

[18] Xiaochen Li,et al. Query Expansion Based on Crowd Knowledge for Code Search , 2016, IEEE Transactions on Services Computing.

[19] Kathryn T. Stolee,et al. Solving the Search for Source Code , 2014, ACM Trans. Softw. Eng. Methodol..

[20] David Lo,et al. SEWordSim: software-specific word similarity database , 2014, ICSE Companion.

[21] Claudio Carpineto,et al. A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[22] Eunseok Lee,et al. Improved bug localization based on code change histories and bug reports , 2017, Inf. Softw. Technol..

[23] Collin McMillan,et al. Portfolio: Searching for relevant functions and their usages in millions of lines of code , 2013, TSEM.

[24] Harald C. Gall,et al. Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[25] Mira Mezini,et al. Evaluating the evaluations of code recommender systems: A reality check , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26] Dongmei Zhang,et al. CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[27] Danny Dig,et al. API code recommendation using statistical learning from fine-grained changes , 2016, SIGSOFT FSE.