Improving feature location by transforming the query from natural language into requirements

Software maintenance and evolution activities are responsible for the emergence of a great demand of feature location approaches that search relevant code in a large codebase. However, this search is usually performed manually and relies heavily on developers. In this paper, we propose a feature location approach that, instead of searching directly into code from a natural language query as other approaches do, transforms a natural language query to a query that is made up of the requirements that are located as relevant. Furthermore, our approach limits the scope of the code search space by selecting only the code of those products that hold relevant requirements. We evaluate the overall effectiveness of our approach in the industrial domain of train control software. Our results show that our approach improves in 18.1% the results of precision with regard to searching directly into code, which encourages further research in this direction.

[1]  Mira Mezini,et al.  Querying source code with natural language , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[2]  Feifan Liu,et al.  Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts , 2009, NAACL.

[3]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[4]  Marsha Chechik,et al.  A Survey of Feature Location Techniques , 2013, Domain Engineering, Product Lines, Languages, and Conceptual Models.

[5]  David Lo,et al.  Automated construction of a software-specific word similarity database , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[6]  Abdelhak-Djamel Seriai,et al.  Feature Location in a Collection of Product Variants: Combining Information Retrieval and Hierarchical Clustering , 2014, SEKE.

[7]  John Mylopoulos,et al.  Learning to Rank for Question-Oriented Software Text Retrieval (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[9]  Dongmei Zhang,et al.  CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Jinqiu Yang,et al.  Inferring semantically related words from software context , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[11]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[12]  Bogdan Dit,et al.  Using Data Fusion and Web Mining to Support Feature Location in Software , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[15]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[16]  Christophe Moulin,et al.  Entropy based feature selection for text categorization , 2011, SAC.

[17]  Wing-Kai Hon,et al.  String Retrieval for Multi-pattern Queries , 2010, SPIRE.

[18]  Emily Hill,et al.  Improving source code search with natural language phrasal representations of method signatures , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[19]  Jane Cleland-Huang,et al.  On-demand feature recommendations derived from mining public product descriptions , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[20]  Christoph Pohl,et al.  An Exploratory Study of Information Retrieval Techniques in Domain Analysis , 2008, 2008 12th International Software Product Line Conference.

[21]  Joel Ossher,et al.  Sourcerer: An internet-scale software repository , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[22]  David Lo,et al.  Active code search: incorporating user feedback to improve code search relevance , 2014, ASE.

[23]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[24]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[25]  Krzysztof Czarnecki,et al.  An Exploratory Study of Cloning in Industrial Software Product Lines , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[26]  Jin Wang,et al.  Building Effective Queries In Natural Language Information Retrieval , 1997, ANLP.

[27]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[28]  Ted J. Biggerstaff,et al.  The concept assignment problem in program understanding , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[29]  Jane Cleland-Huang,et al.  Learning effective query transformations for enhanced requirements trace retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[31]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[32]  Carlos Cetina,et al.  Leveraging Feature Location to Extract the Clone-and-Own Relationships of a Family of Software Products , 2016, ICSR.

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Silvio Romero de Lemos Meira,et al.  Combining rule-based and information retrieval techniques to assign software change requests , 2014, ASE.