Exploiting spatial code proximity and order for improved source code retrieval for bug localization

Practically all information retrieval based approaches developed to date for automatic bug localization are based on the bag‐of‐words assumption that ignores any positional and ordering relationships between the terms in a query. In this paper, we argue that bug reports are ill‐served by this assumption because such reports frequently contain various types of structural information whose terms must obey certain positional and ordering constraints. It therefore stands to reason that the quality of retrieval for bug localization would improve if these constraints could be taken into account when searching for the most relevant files. In this paper, we demonstrate that such is indeed the case. We show how the well‐known Markov Random Field based retrieval framework can be used for taking into account the term‐term proximity and ordering relationships in a query vis‐à‐vis the same relationships in the files of a source‐code library to greatly improve the quality of retrieval of the most relevant source files. We have carried out our experimental evaluations on popular large software projects using over 4000 bug reports. The results we present demonstrate unequivocally that the new proposed approach is far superior to the widely used bag‐of‐words based approaches. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Emily Hill,et al.  Improving source code search with natural language phrasal representations of method signatures , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[2]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[3]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[4]  Iadh Ounis,et al.  Incorporating term dependency in the dfr framework , 2007, SIGIR.

[5]  Hung Viet Nguyen,et al.  A topic-based approach for narrowing the search space of buggy files from a bug report , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[6]  Denys Poshyvanyk,et al.  Concept location using formal concept analysis and information retrieval , 2012, TSEM.

[7]  Ashish Sureka,et al.  A static technique for fault localization using character n-gram based information retrieval model , 2012, ISEC.

[8]  André van der Hoek,et al.  Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering , 2010, FSE 2010.

[9]  Thomas Zimmermann,et al.  Extraction of bug localization benchmarks from history , 2007, ASE.

[10]  Yann-Gaël Guéhéneuc,et al.  Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[11]  Andrian Marcus,et al.  On the Use of Stack Traces to Improve Text Retrieval-Based Bug Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[12]  Iadh Ounis,et al.  University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier , 2004, TREC.

[13]  Rahul Premraj,et al.  Do stack traces help developers fix bugs? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[14]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[15]  I. Ounis,et al.  University of Glasgow at the Web Track of TREC 2002 , 2002, TREC.

[16]  Chao Liu,et al.  Statistical Debugging: A Hypothesis Testing-Based Approach , 2006, IEEE Transactions on Software Engineering.

[17]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[18]  Emily Hill,et al.  On the use of positional proximity in IR-based feature location , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[19]  Mario Linares Vásquez,et al.  Mining Android App Usages for Generating Actionable GUI-Based Execution Scenarios , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[20]  Martin P. Robillard,et al.  Topology analysis of software dependencies , 2008, TSEM.

[21]  Alexander Egyed,et al.  Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering , 2007, ASE 2007.

[22]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[23]  Avinash C. Kak,et al.  Assisting code search with automatic Query Reformulation for bug localization , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[24]  Iadh Ounis,et al.  University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier , 2006, TREC.

[25]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[26]  Letha H. Etzkorn,et al.  Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation , 2008, 2008 15th Working Conference on Reverse Engineering.

[27]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[28]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[29]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[30]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[31]  Tibor Gyimóthy,et al.  Using information retrieval based coupling measures for impact analysis , 2009, Empirical Software Engineering.

[32]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[33]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[34]  Ahmed E. Hassan,et al.  The Impact of Classifier Configuration and Classifier Combination on Bug Localization , 2013, IEEE Transactions on Software Engineering.

[35]  Avinash C. Kak,et al.  Retrieval from software libraries for bug localization: a comparative study of generic and composite text models , 2011, MSR '11.

[36]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[37]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[38]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[39]  Avinash C. Kak,et al.  Incorporating version histories in Information Retrieval based bug localization , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[40]  Andreas Zeller,et al.  Lightweight bug localization with AMPLE , 2005, AADEBUG'05.

[41]  Anh Tuan Nguyen,et al.  Multi-layered approach for recovering links between bug reports and fixes , 2012, SIGSOFT FSE.

[42]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[43]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[44]  Thomas Zimmermann,et al.  Extracting structural information from bug reports , 2008, MSR '08.

[45]  Bogdan Dit,et al.  Integrated impact analysis for managing software changes , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[46]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[47]  Tim Menzies,et al.  On the use of relevance feedback in IR-based concept location , 2009, 2009 IEEE International Conference on Software Maintenance.

[48]  Sriram K. Rajamani,et al.  DebugAdvisor: a recommender system for debugging , 2009, ESEC/FSE '09.

[49]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .