Exploring the Benefits of Utilizing Conceptual Information in Test-to-Code Traceability

Striving for reliability of software systems often results in immense numbers of tests. Due to the lack of a generally used annotation, finding the parts of code these tests were meant to assess can be a demanding task. This is a valid problem of software engineering called test-to-code traceability. Recent research on the subject has attempted to cope with this problem applying various approaches and their combinations, achieving profound results. These approaches have involved the use of naming conventions during development processes and also have utilized various information retrieval (IR) methods often referred to as conceptual information. In this work we investigate the benefits of textual information located in software code and its value for aiding traceability. We evaluated the capabilities of the natural language processing technique called Latent Semantic Indexing (LSI) in the view of the results of the naming conventions technique on five real, medium sized software systems. Although LSI is already used for this purpose, we extend the viewpoint of one-to-one traceability approach to the more versatile view of LSI as a recommendation system. We found that considering the top 5 elements in the ranked list increases the results by 30% on average and makes LSI a viable alternative in projects where naming conventions are not followed systematically.

[1]  Per Runeson,et al.  Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability , 2013, Empirical Software Engineering.

[2]  Martin P. Robillard,et al.  Recommendation Systems for Software Engineering , 2010, IEEE Software.

[3]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[4]  Gabriele Bavota,et al.  Recovering test-to-code traceability using slicing and textual analysis , 2014, J. Syst. Softw..

[5]  Harry M. Sneed Reverse engineering of test cases for selective regression testing , 2004, Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings..

[6]  David Lo,et al.  Practitioners' expectations on automated fault localization , 2016, ISSTA.

[7]  David Lo,et al.  Multi-Factor Duplicate Question Detection in Stack Overflow , 2015, Journal of Computer Science and Technology.

[8]  Andrea De Lucia,et al.  Recovering traceability links between unit tests and classes under test: An improved method , 2010, 2010 IEEE International Conference on Software Maintenance.

[9]  Collin McMillan,et al.  When and How Using Structural Information to Improve IR-Based Traceability Recovery , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[10]  Giuliano Antoniol,et al.  Can Better Identifier Splitting Techniques Help Feature Location? , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[11]  Lingfeng Bao,et al.  “Automated Debugging Considered Harmful” Considered Harmful: A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[12]  Árpád Beszédes,et al.  Information retrieval based feature analysis for product line adoption in 4GL systems , 2017, 2017 17th International Conference on Computational Science and Its Applications (ICCSA).

[13]  Emily Hill,et al.  An empirical study of identifier splitting techniques , 2014, Empirical Software Engineering.

[14]  Harald C. Gall,et al.  The impact of test case summaries on bug fixing performance: an empirical investigation , 2016, ICSE 2016.

[15]  Paul W. McBurney Automatic Documentation Generation via Source Code Summarization , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[16]  Emily Hill,et al.  On the Use of Stemming for Concern Location and Bug Localization in Java , 2012, 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation.

[17]  Gabriele Bavota,et al.  Query-based configuration of text retrieval solutions for software engineering tasks , 2015, ESEC/SIGSOFT FSE.

[18]  Manabu Kamimura,et al.  Towards generating human-oriented summaries of unit test cases , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[19]  Andrian Marcus,et al.  On the Use of Stack Traces to Improve Text Retrieval-Based Bug Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[20]  Sarfraz Khurshid,et al.  An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[21]  Bonita Sharif,et al.  Improving the accuracy of duplicate bug report detection using textual similarity measures , 2014, MSR 2014.

[22]  Emily Hill,et al.  Towards automatically generating descriptive names for unit tests , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Jeffrey C. Carver,et al.  Evaluating source code summarization techniques: Replication and expansion , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[24]  Boyang Li,et al.  Automatically Documenting Unit Test Cases , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[25]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[26]  Jens Krinke,et al.  EzUnit: A Framework for Associating Failed Unit Tests with Potential Programming Errors , 2007, XP.

[27]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[28]  Serge Demeyer,et al.  Establishing Traceability Links between Unit Test Cases and Units under Test , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[29]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[30]  Jonathan I. Maletic,et al.  Heuristic-based part-of-speech tagging of source code identifiers and comments , 2015, 2015 IEEE 5th Workshop on Mining Unstructured Data (MUD).

[31]  Árpád Beszédes,et al.  Supporting Product Line Adoption by Combining Syntactic and Textual Feature Extraction , 2018, ICSR.

[32]  Gabriele Bavota,et al.  SCOTCH: Test-to-code traceability using slicing and conceptual coupling , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[33]  Sai Peck Lee,et al.  Achievements and Challenges in State-of-the-Art Software Traceability Between Test and Code Artifacts , 2014, IEEE Transactions on Reliability.

[34]  Alexander Egyed,et al.  Assessing the effect of requirements traceability for software maintenance , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).