Supporting requirements to code traceability through refactoring

In this paper, we hypothesize that the distorted traceability tracks of a software system can be systematically re-established through refactoring, a set of behavior-preserving transformations for keeping the system quality under control during evolution. To test our hypothesis, we conduct an experimental analysis using three requirements-to-code datasets from various application domains. Our objective is to assess the impact of various refactoring methods on the performance of automated tracing tools based on information retrieval. Results show that renaming inconsistently named code identifiers, using Rename Identifier refactoring, often leads to improvements in traceability. In contrast, removing code clones, using eXtract Method (XM) refactoring, is found to be detrimental. In addition, results show that moving misplaced code fragments, using Move Method refactoring, has no significant impact on trace link retrieval. We further evaluate Rename Identifier refactoring by comparing its performance with other strategies often used to overcome the vocabulary mismatch problem in software artifacts. In addition, we propose and evaluate various techniques to mitigate the negative impact of XM refactoring. An effective traceability sign analysis is also conducted to quantify the effect of these refactoring methods on the vocabulary structure of software systems.

[1]  Paolo Tonella,et al.  Restructuring program identifier names , 2000, Proceedings 2000 International Conference on Software Maintenance.

[2]  Mik Kersten,et al.  How are Java software developers using the Elipse IDE? , 2006, IEEE Software.

[3]  Eleni Stroulia,et al.  Identification and application of Extract Class refactorings in object-oriented systems , 2012, J. Syst. Softw..

[4]  Nicolas Anquetil,et al.  Assessing the relevance of identifier names in a legacy software system , 1998, CASCON.

[5]  Dirk Muthig,et al.  Refactoring a legacy component for reuse in a software product line: a case study: Practice Articles , 2006 .

[6]  David W. Binkley,et al.  Extracting Meaning from Abbreviated Identifiers , 2007, Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007).

[7]  Patrick Mäder,et al.  Trace Queries for Safety Requirements in High Assurance Systems , 2012, REFSQ.

[8]  Nan Niu,et al.  Enhancing candidate link generation for requirements tracing: The cluster hypothesis revisited , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[9]  Olly Gotel,et al.  Out of the labyrinth: Leveraging other disciplines for requirements traceability , 2011, 2011 IEEE 19th International Requirements Engineering Conference.

[10]  Andrian Marcus,et al.  Using latent semantic analysis to identify similarities in source code to support program understanding , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[11]  Andrew P. Black,et al.  Breaking the barriers to successful refactoring: observations and tools for extract method , 2008, ICSE.

[12]  Genny Tortora,et al.  Assessing IR-based traceability recovery tools through controlled experiments , 2009, Empirical Software Engineering.

[13]  Peta Wyeth,et al.  Improving Usability of Software Refactoring Tools , 2007, 2007 Australian Software Engineering Conference (ASWEC'07).

[14]  Kari Laitinen,et al.  Estimating understandability of software documents , 1996, SOEN.

[15]  Marija Katić,et al.  Towards an appropriate software refactoring tool support , 2009 .

[16]  Alexander Egyed,et al.  A Scenario-Driven Approach to Trace Dependency Analysis , 2003, IEEE Trans. Software Eng..

[17]  Miryung Kim,et al.  An ethnographic study of copy and paste programming practices in OOPL , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[18]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[19]  Tom Mens,et al.  A survey of software refactoring , 2004, IEEE Transactions on Software Engineering.

[20]  Xinhui Tu,et al.  Query expansion using explicit semantic analysis , 2012, ICIMCS '12.

[21]  Nan Niu,et al.  A semantic relatedness approach for traceability link recovery , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[22]  Simone Teufel,et al.  An Overview of Evaluation Methods in TREC Ad Hoc Information Retrieval and TREC Question Answering , 2007 .

[23]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[24]  Tom Mens,et al.  Identifying refactoring opportunities using logic meta programming , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[25]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[26]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[27]  George Karypis,et al.  Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[28]  William F. Opdyke,et al.  Refactoring object-oriented frameworks , 1992 .

[29]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[30]  Giuliano Antoniol,et al.  Design‐code traceability for object‐oriented systems , 2000, Ann. Softw. Eng..

[31]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[32]  Francesca Arcelli Fontana,et al.  Automatic detection of bad smells in code: An experimental assessment , 2012, J. Object Technol..

[33]  Andrea De Lucia,et al.  Using IR methods for labeling source code artifacts: Is it worthwhile? , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[34]  Latifa Guerrouj,et al.  Normalizing source code vocabulary to support program comprehension and software quality , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[35]  Andrew P. Black,et al.  How we refactor, and how we know it , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[36]  Markus Pizka,et al.  Concise and Consistent Naming , 2005, IWPC.

[37]  Christian Roth,et al.  Recommending rename refactorings , 2010, RSSE '10.

[38]  Nan Niu,et al.  Departures from optimality: Understanding human analyst's information foraging in assisted requirements tracing , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[39]  Meir M. Lehman,et al.  On understanding laws, evolution, and conservation in the large-program life cycle , 1984, J. Syst. Softw..

[40]  Mik Kersten,et al.  How are lava software developers using the eclipse IDE , 2006 .

[41]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[42]  Nan Niu,et al.  Supporting requirements traceability through refactoring , 2013, 2013 21st IEEE International Requirements Engineering Conference (RE).

[43]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[44]  Janice Singer,et al.  How software engineers use documentation: the state of the practice , 2003, IEEE Software.

[45]  Robert D. Macredie,et al.  The effects of comments and identifier names on program comprehensibility: an experimental investigation , 1996, J. Program. Lang..

[46]  Chanchal Kumar Roy,et al.  An Empirical Study of Function Clones in Open Source Software , 2008, 2008 15th Working Conference on Reverse Engineering.

[47]  Angela M. Dean,et al.  Design and analysis of experiment , 2013 .

[48]  Arie van Deursen,et al.  On the use of clone detection for identifying crosscutting concern code , 2005, IEEE Transactions on Software Engineering.

[49]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[50]  Nan Niu,et al.  Source code indexing for automated tracing , 2011, TEFSE '11.

[51]  Richard A. Harshman,et al.  Information retrieval using a singular value decomposition model of latent semantic structure , 1988, SIGIR '88.

[52]  Elmar Jürgens,et al.  The loss of architectural knowledge during system evolution: An industrial case study , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[53]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[54]  Eya Ben Charrada,et al.  Identifying outdated requirements based on source code changes , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[55]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[56]  Alberto Sillitti,et al.  Does Refactoring Improve Reusability? , 2006, ICSR.

[57]  Ralph E. Johnson,et al.  The role of refactorings in API evolution , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[58]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[59]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[60]  Jane Huffman Hayes,et al.  Application of swarm techniques to requirements tracing , 2011, Requirements Engineering.

[61]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[62]  Alexander Chatzigeorgiou,et al.  Identification of Move Method Refactoring Opportunities , 2009, IEEE Transactions on Software Engineering.

[63]  Giuliano Antoniol,et al.  An automatic approach to identify class evolution discontinuities , 2004 .

[64]  Carl K. Chang,et al.  Event-Based Traceability for Managing Evolutionary Change , 2003, IEEE Trans. Software Eng..

[65]  Jane Cleland-Huang,et al.  Towards mining replacement queries for hard-to-retrieve traces , 2010, ASE.

[66]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[67]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[68]  Andrea De Lucia,et al.  Incremental Approach and User Feedbacks: a Silver Bullet for Traceability Recovery , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[69]  Jane Huffman Hayes,et al.  Improving requirements tracing via information retrieval , 2003, Proceedings. 11th IEEE International Requirements Engineering Conference, 2003..

[70]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[71]  Sven Apel,et al.  Types and modularity for implicit invocation with implicit announcement , 2010, TSEM.

[72]  David W. Binkley,et al.  Normalizing Source Code Vocabulary , 2010, 2010 17th Working Conference on Reverse Engineering.

[73]  Jane Huffman Hayes,et al.  Assessing traceability of software engineering artifacts , 2010, Requirements Engineering.

[74]  Rudolf K. Keller,et al.  High-impact Refactoring Based on Architecture Violations , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[75]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[76]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[77]  Martin P. Robillard,et al.  Clone region descriptors: Representing and tracking duplication in source code , 2010, TSEM.

[78]  Mika Mäntylä,et al.  Drivers for software refactoring decisions , 2006, ISESE '06.

[79]  Ilka Philippow,et al.  Rule-Based Maintenance of Post-Requirements Traceability Relations , 2008, 2008 16th IEEE International Requirements Engineering Conference.

[80]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[81]  Steve Counsell,et al.  Refactoring trends across N versions of N Java open source systems : an empirical study , 2005 .

[82]  Emine Yilmaz,et al.  A geometric interpretation of r-precision and its correlation with average precision , 2005, SIGIR '05.

[83]  Stephen Clark,et al.  Best Practices for Automated Traceability , 2007, Computer.

[84]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[85]  Giuliano Antoniol,et al.  The quest for Ubiquity: A roadmap for software and systems traceability research , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[86]  David W. Binkley,et al.  Identifier length and limited programmer memory , 2009, Sci. Comput. Program..

[87]  Markus Pizka,et al.  Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[88]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[89]  George Spanoudakis,et al.  Software Traceability : A Roadmap , 2005 .

[90]  Dirk Muthig,et al.  Refactoring a legacy component for reuse in a software product line: a case study , 2006, J. Softw. Maintenance Res. Pract..