A comprehensive characterization of NLP techniques for identifying equivalent requirements

Though very important in software engineering, linking artifacts of the same type (clone detection) or of different types (traceability recovery) is extremely tedious, error-prone and requires significant effort. Past research focused on supporting analysts with mechanisms based on Natural Language Processing (NLP) to identify candidate links. Because a plethora of NLP techniques exists, and their performances vary among contexts, it is important to characterize them according to the provided level of support. The aim of this paper is to characterize a comprehensive set of NLP techniques according to the provided level of support to human analysts in detecting equivalent requirements. The characterization consists on a case study, featuring real requirements, in the context of an Italian company in the defense and aerospace domain. The major result from the case study is that simple NLP are more precise than complex ones.

[1]  Björn Regnell,et al.  An experiment on linguistic tool support for consolidation of requirements from multiple sources in market-driven product development , 2006, Empirical Software Engineering.

[2]  Kenneth Ward Church,et al.  Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[3]  Giuliano Antoniol,et al.  Traceability recovery in RAD software systems , 2002, Proceedings 10th International Workshop on Program Comprehension.

[4]  Genny Tortora,et al.  Assessing IR-based traceability recovery tools through controlled experiments , 2009, Empirical Software Engineering.

[5]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[6]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[7]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[8]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[9]  Björn Regnell,et al.  A Feasibility Study of Automated Natural Language Requirements Analysis in Market-Driven Development , 2002, Requirements Engineering.

[10]  Jane Cleland-Huang,et al.  Clustering support for automated tracing , 2007, ASE '07.

[11]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[12]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[13]  Barbara A. Kitchenham,et al.  A Procedure for Analyzing Unbalanced Datasets , 1998, IEEE Trans. Software Eng..

[14]  Jane Huffman Hayes,et al.  Tracing requirements to defect reports: an application of information retrieval techniques , 2005, Innovations in Systems and Software Engineering.

[15]  Björn Regnell,et al.  A linguistic-engineering approach to large-scale requirements management , 2005, IEEE Software.

[16]  Rick Kazman,et al.  Decision-making techniques for software architecture design: A comparative survey , 2011, CSUR.

[17]  Paul Clements,et al.  Software product lines - practices and patterns , 2001, SEI series in software engineering.

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[20]  B. McCune,et al.  Analysis of Ecological Communities , 2002 .

[21]  William B. Frakes,et al.  Software reuse through information retrieval , 1986, SIGF.

[22]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[23]  Ian Witten,et al.  Data Mining , 2000 .

[24]  Xiao-Ying Liu,et al.  Measuring Semantic Similarity in Wordnet , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[25]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[26]  Giuliano Antoniol,et al.  Identifying the starting impact set of a maintenance request: a case study , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[27]  Giuliano Antoniol,et al.  Traceability recovery by modeling programmer behavior , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[28]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[29]  Giovanni Cantone,et al.  The impact of automated support for linking equivalent requirements based on similarity measures , 2009 .

[30]  Giovanni Cantone,et al.  Peaceful Coexistence: Agile Developer Perspectives on Software Architecture , 2010, IEEE Software.

[31]  Arie van Deursen,et al.  Can LSI help reconstructing requirements traceability in design and test? , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[32]  Gerardo Canfora,et al.  A Taxonomy of Information Retrieval Models and Tools , 2004 .

[33]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[34]  Andrea De Lucia,et al.  Incremental Approach and User Feedbacks: a Silver Bullet for Traceability Recovery , 2006, 2006 22nd IEEE International Conference on Software Maintenance.