Interactive Query Reformulation for Source-Code Search With Word Relations

Searching source code is a common activity in many software engineering tasks. To some extent, the quality of the query determines the accuracy of query results. In practice, it is difficult for developers to provide a high-quality query, especially for the novice who just takes over the software project with a short time. What is more, existing code search techniques using queries expressed in natural language offer little support to help developers determine whether the search results are relevant or not. When a query preforms poorly, it has to be reformulated. In this paper, we present a novel approach, INQRES, to interactively reformulate the search query considering the relations between words in the source code to optimize the query quality. INQRES analyzes the keyword relations in the source code and builds AND and OR relations in an interactive way for developer to select suitable words for query reformulation. To evaluate the effectiveness of INQRES, we perform an empirical study on the jEdit project. Empirical results show that INQRES can effectively reformulate the search query, and the quality of the reformulated query of INQRES is better than that of the state-of-art technique, i.e., QReformu.

[1]  Sergio Di Martino,et al.  LINSEN: An efficient approach to split identifiers and expand abbreviations , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[2]  ChengXiang Zhai,et al.  Adaptive relevance feedback in information retrieval , 2009, CIKM.

[3]  Xiaochen Li,et al.  Query Expansion Based on Crowd Knowledge for Code Search , 2016, IEEE Transactions on Services Computing.

[4]  Katsuhiko Gondow,et al.  Toward mining "concept keywords" from identifiers in large software projects , 2005, MSR.

[5]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[6]  Yann-Gaël Guéhéneuc,et al.  TIDIER: an identifier splitting approach using speech recognition techniques , 2013, J. Softw. Evol. Process..

[7]  Lars Kai Hansen,et al.  Pruning the vocabulary for better context recognition , 2004, ICPR 2004.

[8]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[9]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[10]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[13]  Avinash C. Kak,et al.  Assisting code search with automatic Query Reformulation for bug localization , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[14]  Junwu Zhu,et al.  Empirical studies on the NLP techniques for source code data preprocessing , 2014, EAST 2014.

[15]  Denys Poshyvanyk,et al.  Concept location using formal concept analysis and information retrieval , 2012, TSEM.

[16]  David Lo,et al.  Query expansion via WordNet for effective code search , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[17]  Václav Rajlich,et al.  Software evolution and maintenance , 2014, FOSE.

[18]  Denys Poshyvanyk,et al.  Source Code Exploration with Google , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[19]  Denys Poshyvanyk,et al.  Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).