Analyzing Requirements and Traceability Information to Improve Bug Localization

Locating bugs in industry-size software systems is time consuming and challenging. An automated approach for assisting the process of tracing from bug descriptions to relevant source code benefits developers. A large body of previous work aims to address this problem and demonstrates considerable achievements. Most existing approaches focus on the key challenge of improving techniques based on textual similarity to identify relevant files. However, there exists a lexical gap between the natural language used to formulate bug reports and the formal source code and its comments. To bridge this gap, state-of-the-art approaches contain a component for analyzing bug history information to increase retrieval performance. In this paper, we propose a novel approach TraceScore that also utilizes projects' requirements information and explicit dependency trace links to further close the gap in order to relate a new bug report to defective source code files. Our evaluation on more than 13,000 bug reports shows, that TraceScore significantly outperforms two state-of-the-art methods. Further, by integrating TraceScore into an existing bug localization algorithm, we found that TraceScore significantly improves retrieval performance by 49% in terms of mean average precision (MAP).

[1]  Patrick Mäder,et al.  Software traceability: trends and future directions , 2014, FOSE.

[2]  Collin McMillan,et al.  When and How Using Structural Information to Improve IR-Based Traceability Recovery , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[3]  Andrea De Lucia,et al.  Improving Source Code Lexicon via Traceability and Information Retrieval , 2011, IEEE Transactions on Software Engineering.

[4]  Thomas Zimmermann,et al.  Information needs in bug reports: improving cooperation between developers and users , 2010, CSCW '10.

[5]  Avinash C. Kak,et al.  Retrieval from software libraries for bug localization: a comparative study of generic and composite text models , 2011, MSR '11.

[6]  Collin McMillan,et al.  Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications , 2012, IEEE Transactions on Software Engineering.

[7]  Alexander Egyed,et al.  Do developers benefit from requirements traceability when evolving and maintaining a software system? , 2014, Empirical Software Engineering.

[8]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[9]  Nan Niu,et al.  TraCter: A tool for candidate traceability link clustering , 2011, 2011 IEEE 19th International Requirements Engineering Conference.

[10]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[11]  Michael Jackson,et al.  A Reference Model for Requirements and Specifications , 2000, IEEE Softw..

[12]  Hung Viet Nguyen,et al.  A topic-based approach for narrowing the search space of buggy files from a bug report , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  O. J. Dunn Multiple Comparisons Using Rank Sums , 1964 .

[15]  Razvan C. Bunescu,et al.  Learning to rank relevant files for bug reports using domain knowledge , 2014, SIGSOFT FSE.

[16]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[17]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[18]  Ming Wen,et al.  Locus: Locating bugs from software changes , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Patrick Mäder,et al.  Traceability in the Wild: Automatically Augmenting Incomplete Trace Links , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[20]  Barbara Paech,et al.  Do Information Retrieval Algorithms for Automated Traceability Perform Effectively on Issue Tracking System Data? , 2016, REFSQ.

[21]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[22]  A. De Lucia,et al.  Traceability management for impact analysis , 2008, 2008 Frontiers of Software Maintenance.

[23]  David Lo,et al.  Version history, similar report, and structure: putting them together for improved bug localization , 2014, ICPC 2014.

[24]  Barbara Paech,et al.  Systematic requirements recycling through abstraction and traceability , 2002, Proceedings IEEE Joint International Conference on Requirements Engineering.

[25]  Evan Moritz,et al.  TraceLab: An experimental workbench for equipping researchers to innovate, synthesize, and comparatively evaluate traceability solutions , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[26]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[27]  Andrea De Lucia,et al.  On integrating orthogonal information retrieval methods to improve traceability recovery , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[28]  David Lo,et al.  Potential biases in bug localization: do they matter? , 2014, ASE.

[29]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[30]  Xiao Ma,et al.  From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[31]  Patrick Mäder,et al.  Towards feature-aware retrieval of refinement traces , 2013, 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE).

[32]  Bogdan Dit,et al.  Enhancing Software Traceability by Automatically Expanding Corpora with Relevant Documentation , 2013, 2013 IEEE International Conference on Software Maintenance.

[33]  Zhendong Niu,et al.  Traceability-enabled refactoring for managing just-in-time requirements , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[34]  Letha H. Etzkorn,et al.  Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation , 2008, 2008 15th Working Conference on Reverse Engineering.

[35]  Mordechai Nisenson,et al.  A Traceability Technique for Specifications , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[36]  LiGuo Huang,et al.  Can method data dependencies support the assessment of traceability between requirements and source code? , 2015, J. Softw. Evol. Process..

[37]  Anh Tuan Nguyen,et al.  Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[38]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[39]  Patrick Mäder,et al.  The IlmSeven Dataset , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[40]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[41]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[42]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[44]  Ilka Philippow,et al.  Requirements Traceability across Organizational Boundaries - A Survey and Taxonomy , 2013, REFSQ.

[45]  Lu Zhang,et al.  Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[46]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[47]  David Lo,et al.  AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization , 2016, J. Softw. Evol. Process..

[48]  Abraham Bernstein,et al.  Software process data quality and characteristics: a historical view on open and closed source projects , 2009, IWPSE-Evol '09.

[49]  Miryung Kim,et al.  An empirical study on reducing omission errors in practice , 2014, ASE.

[50]  Patrick Mäder,et al.  An empirical study on project-specific traceability strategies , 2013, 2013 21st IEEE International Requirements Engineering Conference (RE).

[51]  David Lo,et al.  An effective change recommendation approach for supplementary bug fixes , 2017, Automated Software Engineering.

[52]  Sushil Krishna Bajracharya,et al.  Leveraging usage similarity for effective retrieval of examples in code repositories , 2010, FSE '10.

[53]  Stefan Biffl,et al.  A case study on value-based requirements tracing , 2005, ESEC/FSE-13.

[54]  Nan Niu,et al.  Using Semantics-Enabled Information Retrieval in Requirements Tracing: An Ongoing Experimental Investigation , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference.

[55]  Stephen Clark,et al.  Best Practices for Automated Traceability , 2007, Computer.