Automatic query reformulations for feature location in a model-based family of software products

Abstract No maintenance activity can be completed without Feature Location (FL), which is finding the set of software artifacts that realize a particular functionally. Despite the importance of FL, the vast majority of work has been focused on retrieving code, whereas other software artifacts such as the models have been neglected. Furthermore, locating a piece of information from a query in a large repository is a challenging task as it requires knowledge of the vocabulary used in the software artifacts. This can be alleviated by automatically reformulating the query (adding or removing terms). In this paper, we test four existing query reformulation techniques, which perform the best for FL in code but have never been used for FL in models. Specifically, we test these techniques in two industrial domains: a model-based family of firmwares for induction hobs, and a model-based family of PLC software to control trains. We compare the results provided by our FL approach using the query and the reformulated queries by means of statistical analysis. Our results show that reformulated queries do not improve the performance in models, which could lead towards a new direction in the creation or reconsideration of these techniques to be applied in models.

[1]  Emily Hill,et al.  Improving source code search with natural language phrasal representations of method signatures , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[2]  John Mylopoulos,et al.  Learning to Rank for Question-Oriented Software Text Retrieval (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[4]  Emily Hill,et al.  Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[5]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[6]  Christoph Pohl,et al.  An Exploratory Study of Information Retrieval Techniques in Domain Analysis , 2008, 2008 12th International Software Product Line Conference.

[7]  Meir M. Lehman,et al.  A Paradigm for the Behavioural Modelling of Software Processes using System Dynamics , 2001 .

[8]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[9]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[10]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[11]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[12]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[13]  R. Grissom,et al.  Effect sizes for research: A broad practical approach. , 2005 .

[14]  Jaime Font,et al.  Feature location in models through a genetic algorithm driven by information retrieval techniques , 2016, MoDELS.

[15]  Lionel C. Briand,et al.  aToucan: An Automated Framework to Derive UML Analysis Models from Use Case Models , 2015, TSEM.

[16]  Birger Møller-Pedersen,et al.  Adding Standardized Variability to Domain Specific Languages , 2008, 2008 12th International Software Product Line Conference.

[17]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[18]  Dongmei Zhang,et al.  CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[20]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[21]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[22]  Jinqiu Yang,et al.  Inferring semantically related words from software context , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[23]  Oscar Pastor,et al.  Analyzing the impact of natural language processing over feature location in models , 2017, GPCE.

[24]  Bogdan Dit,et al.  Using Data Fusion and Web Mining to Support Feature Location in Software , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[25]  James Allan,et al.  Effective and efficient user interaction for long queries , 2008, SIGIR '08.

[26]  Jane Cleland-Huang,et al.  On-demand feature recommendations derived from mining public product descriptions , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[27]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[28]  Tim Menzies,et al.  Scalable product line configuration: A straw to break the camel's back , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[29]  X. Allan Lu,et al.  Query Expansion/Reduction and its Impact on Retrieval Effectiveness , 1994, TREC.

[30]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.

[31]  Gregory M. Kapfhammer,et al.  Parameter Tuning for Search-Based Test-Data Generation Revisited: Support for Previous Results , 2014, 2014 14th International Conference on Quality Software.

[32]  David Lo,et al.  Automated construction of a software-specific word similarity database , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[33]  Avinash C. Kak,et al.  Assisting code search with automatic Query Reformulation for bug localization , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[34]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[35]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[36]  Silvio Romero de Lemos Meira,et al.  Combining rule-based and information retrieval techniques to assign software change requests , 2014, ASE.

[37]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[38]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[39]  Bran Selic,et al.  The Pragmatics of Model-Driven Development , 2003, IEEE Softw..

[40]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[41]  Tim Menzies,et al.  On the use of relevance feedback in IR-based concept location , 2009, 2009 IEEE International Conference on Software Maintenance.

[42]  Jane Cleland-Huang,et al.  Learning effective query transformations for enhanced requirements trace retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[44]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[45]  Mira Mezini,et al.  Querying source code with natural language , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[46]  Gordon Fraser,et al.  Parameter tuning or default values? An empirical investigation in search-based software engineering , 2013, Empirical Software Engineering.

[47]  David Lo,et al.  Active code search: incorporating user feedback to improve code search relevance , 2014, ASE.