Estimating the number of remaining links in traceability recovery

Although very important in software engineering, establishing traceability links between software artifacts is extremely tedious, error-prone, and it requires significant effort. Even when approaches for automated traceability recovery exist, these provide the requirements analyst with a, usually very long, ranked list of candidate links that needs to be manually inspected. In this paper we introduce an approach called Estimation of the Number of Remaining Links (ENRL) which aims at estimating, via Machine Learning (ML) classifiers, the number of remaining positive links in a ranked list of candidate traceability links produced by a Natural Language Processing techniques-based recovery approach. We have evaluated the accuracy of the ENRL approach by considering several ML classifiers and NLP techniques on three datasets from industry and academia, and concerning traceability links among different kinds of software artifacts including requirements, use cases, design documents, source code, and test cases. Results from our study indicate that: (i) specific estimation models are able to provide accurate estimates of the number of remaining positive links; (ii) the estimation accuracy depends on the choice of the NLP technique, and (iii) univariate estimation models outperform multivariate ones.

[1]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[2]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[3]  Jane Cleland-Huang,et al.  Improving automated requirements trace retrieval: a study of term-based enhancement methods , 2010, Empirical Software Engineering.

[4]  Raffaella Settimi,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004, Proceedings. 7th International Workshop on Principles of Software Evolution, 2004..

[5]  David R. Anderson,et al.  Statistical inference from capture data on closed animal populations , 1980 .

[6]  Ioannis N. Athanasiadis,et al.  The Fuzzy Lattice Reasoning (FLR) Classifier for Mining Environmental Data , 2007, Computational Intelligence Based on Lattice Theory.

[7]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Mordechai Nisenson,et al.  A Traceability Technique for Specifications , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[9]  LuciaAndrea De,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007 .

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[12]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[13]  J. R. Gillett,et al.  66.49 Spearman versus Kendall , 1982 .

[14]  Claes Wohlin,et al.  Capture-recapture in software inspections after 10 years research--theory, evaluation and application , 2004, J. Syst. Softw..

[15]  Jane Cleland-Huang,et al.  Clustering support for automated tracing , 2007, ASE '07.

[16]  Jane Huffman Hayes,et al.  Towards overcoming human analyst fallibility in the requirements tracing process: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[17]  Mehrdad Sabetzadeh,et al.  Traceability and SysML design slices to support safety inspections: A controlled experiment , 2014, TSEM.

[18]  Jane Cleland-Huang,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004 .

[19]  Mikael Lindvall,et al.  Practical implications of traceability , 1996 .

[20]  Kai-Yuan Cai,et al.  On estimating the number of defects remaining in software , 1998, J. Syst. Softw..

[21]  Jane Cleland-Huang,et al.  Improving trace accuracy through data-driven configuration and composition of tracing features , 2013, ESEC/FSE 2013.

[22]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[23]  Lionel C. Briand,et al.  A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content , 2000, IEEE Trans. Software Eng..

[24]  Genny Tortora,et al.  Assessing IR-based traceability recovery tools through controlled experiments , 2009, Empirical Software Engineering.

[25]  Acm Sigsoft,et al.  22nd ACM/IEEE International Conference on Automated Software Engineering : ASE 07, November 5-9, 2007, Atlanta, Georgia, USA , 2007 .

[26]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[27]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[28]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[29]  Andrea De Lucia,et al.  Incremental Approach and User Feedbacks: a Silver Bullet for Traceability Recovery , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[30]  Andrea De Lucia,et al.  On the role of the nouns in IR-based traceability recovery , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[31]  Jane Huffman Hayes,et al.  Improving requirements tracing via information retrieval , 2003, Proceedings. 11th IEEE International Requirements Engineering Conference, 2003..

[32]  Anneliese Amschler Andrews,et al.  How much testing is enough? Applying stopping rules to behavioral model testing , 1999, Proceedings 4th IEEE International Symposium on High-Assurance Systems Engineering.

[33]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[34]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[35]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[36]  Günther Ruhe,et al.  Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering (SEKE'2004), Banff, Alberta, Canada, June 20-24, 2004 , 2004 .

[37]  Yashwant K. Malaiya,et al.  Estimating the number of residual defects [in software] , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).

[38]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[39]  Olcay Taner Yildiz,et al.  Software defect prediction using Bayesian networks , 2012, Empirical Software Engineering.

[40]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[41]  Andrea De Lucia,et al.  On integrating orthogonal information retrieval methods to improve traceability recovery , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[42]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[43]  Markus Maurer,et al.  How much Testing is Enough? A Learning Approach , 2009, MLDM Posters.

[44]  Jane Cleland-Huang,et al.  Just-in-time traceability for mechatronics systems , 2012, 2012 Second IEEE International Workshop on Requirements Engineering for Systems, Services, and Systems-of-Systems (RESS).

[45]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[46]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[47]  Davide Falessi,et al.  Towards an open-source tool for measuring and visualizing the interest of technical debt , 2015, 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD).

[48]  Davide Falessi,et al.  Achieving and Maintaining CMMI Maturity Level 5 in a Small Organization , 2014, IEEE Software.

[49]  Björn Regnell,et al.  A Feasibility Study of Automated Natural Language Requirements Analysis in Market-Driven Development , 2002, Requirements Engineering.

[50]  Cheng-Gang Bai,et al.  On the Trend of Remaining Software Defect Estimation , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[51]  Arie van Deursen,et al.  Can LSI help reconstructing requirements traceability in design and test? , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[52]  S. Russel and P. Norvig,et al.  “Artificial Intelligence – A Modern Approach”, Second Edition, Pearson Education, 2003. , 2015 .

[53]  Robyn R. Lutz,et al.  Predicting failure-proneness in an evolving software product line , 2013, Inf. Softw. Technol..

[54]  Giuliano Antoniol,et al.  Identifying the starting impact set of a maintenance request: a case study , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[55]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[56]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[57]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[58]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[59]  Jerome L. Myers,et al.  Research Design and Statistical Analysis , 1991 .

[60]  Jane Cleland-Huang,et al.  Tracing architectural concerns in high assurance systems: (NIER track) , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[61]  Arie van Deursen,et al.  An industrial case study in reconstructing requirements views , 2008, Empirical Software Engineering.

[62]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[63]  Jane Cleland-Huang,et al.  A tactic-centric approach for automating traceability of quality concerns , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[64]  Jane Huffman Hayes,et al.  Tracing requirements to defect reports: an application of information retrieval techniques , 2005, Innovations in Systems and Software Engineering.

[65]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets , 2015, ASE 2015.

[66]  J. Affeldt,et al.  The feasibility study , 2019, The Information System Consultant’s Handbook.

[67]  Gerardo Canfora,et al.  Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements via Natural Language Processing Techniques , 2013, IEEE Transactions on Software Engineering.

[68]  Giovanni Cantone,et al.  The impact of automated support for linking equivalent requirements based on similarity measures , 2009 .

[69]  Premkumar T. Devanbu,et al.  Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[70]  Andrea De Lucia,et al.  Improving IR-based Traceability Recovery Using Smoothing Filters , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[71]  Yashwant K. Malaiya,et al.  Estimating the Number of Residual Defects , 1998 .

[72]  Jane Huffman Hayes,et al.  On human analyst performance in assisted requirements tracing: Statistical analysis , 2011, 2011 IEEE 19th International Requirements Engineering Conference.

[73]  Andrea De Lucia,et al.  How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[74]  DekhtyarAlex,et al.  Advancing Candidate Link Generation for Requirements Tracing , 2006 .

[75]  Per Runeson,et al.  Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability , 2013, Empirical Software Engineering.

[76]  Ian Witten,et al.  Data Mining , 2000 .

[77]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .