Analyzing the impact of natural language processing over feature location in models

Feature Location (FL) is a common task in the Software Engineering field, specially in maintenance and evolution of software products. The results of FL depend in a great manner in the style in which Feature Descriptions and software artifacts are written. Therefore, Natural Language Processing (NLP) techniques are used to process them. Through this paper, we analyze the influence of the most common NLP techniques over FL in Conceptual Models through Latent Semantic Indexing, and the influence of human participation when embedding domain knowledge in the process. We evaluated the techniques in a real-world industrial case study in the rolling stocks domain.

[1]  Vimala Balakrishnan,et al.  Stemming and lemmatization: A comparison of retrieval performances , 2014 .

[2]  Alexander Egyed,et al.  Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering , 2007, ASE 2007.

[3]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[4]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[5]  Sandro Schulze,et al.  Interface variability in family model mining , 2013, SPLC '13 Workshops.

[6]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[7]  Maximilian Junker,et al.  Configuring Latent Semantic Indexing for Requirements Tracing , 2015, 2015 IEEE/ACM 2nd International Workshop on Requirements Engineering and Testing.

[8]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[9]  Jacques Klein,et al.  Bottom-up adoption of software product lines: a generic and extensible approach , 2015, SPLC.

[10]  Carlos Cetina,et al.  Towards clone-and-own support: locating relevant methods in legacy products , 2016, SPLC.

[11]  Dunja Mladenic,et al.  A Rule based Approach to Word Lemmatization , 2004 .

[12]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[13]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[14]  Birgit Vogel-Heuser,et al.  Family model mining for function block diagrams in automation software , 2014, SPLC '14.

[15]  Kevin Ryan,et al.  The role of natural language in requirements engineering , 1993, [1993] Proceedings of the IEEE International Symposium on Requirements Engineering.

[16]  Mehrdad Sabetzadeh,et al.  Change impact analysis for Natural Language requirements: An NLP approach , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[17]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[18]  Krzysztof Czarnecki,et al.  Feature Diagrams and Logics: There and Back Again , 2007 .

[19]  Jane Cleland-Huang,et al.  Clustering support for automated tracing , 2007, ASE '07.

[20]  Andrea De Lucia,et al.  On the role of the nouns in IR-based traceability recovery , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[21]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[22]  Sebastian Erdweg,et al.  Variability-aware parsing in the presence of lexical macros and conditional compilation , 2011, OOPSLA '11.

[23]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[24]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[25]  Krzysztof Czarnecki,et al.  Mining configuration constraints: static analyses and empirical results , 2014, ICSE.

[26]  Birger Møller-Pedersen,et al.  Augmenting Product Lines , 2012, 2012 19th Asia-Pacific Software Engineering Conference.

[27]  Andrew David Eisenberg,et al.  Dynamic feature traces: finding features in unfamiliar code , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[28]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[29]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[30]  Zhenchang Xing,et al.  Feature Location in a Collection of Product Variants , 2012, 2012 19th Working Conference on Reverse Engineering.

[31]  Birger Møller-Pedersen,et al.  Adding Standardized Variability to Domain Specific Languages , 2008, 2008 12th International Software Product Line Conference.

[32]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[33]  Krzysztof Czarnecki,et al.  Reverse engineering feature models , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[34]  Jaime Font,et al.  Feature location in models through a genetic algorithm driven by information retrieval techniques , 2016, MoDELS.

[35]  Birger Møller-Pedersen,et al.  Model Comparison to Synthesize a Model-Driven Software Product Line , 2011, 2011 15th International Software Product Line Conference.

[36]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[37]  Jane Huffman Hayes,et al.  Assessing traceability of software engineering artifacts , 2010, Requirements Engineering.

[38]  Edel Garcia Latent Semantic Indexing (LSI) A Fast Track Tutorial , 2006 .

[39]  Jane Huffman Hayes,et al.  Application of Swarm Techniques to Requirements Engineering: Requirements Tracing , 2010, 2010 18th IEEE International Requirements Engineering Conference.

[40]  Gerardo Canfora,et al.  Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements via Natural Language Processing Techniques , 2013, IEEE Transactions on Software Engineering.

[41]  Jaime Font,et al.  Feature Location in Model-Based Software Product Lines Through a Genetic Algorithm , 2016, ICSR.

[42]  R. Grissom,et al.  Effect sizes for research: A broad practical approach. , 2005 .