FineLocator: A novel approach to method-level fine-grained bug localization by query expansion

Abstract Context Bug localization, namely, to locate suspicious snippets from source code files for developers to fix the bug, is crucial for software quality assurance and software maintenance. Effective bug localization technique is desirable for software developers to reduce the effort involved in bug resolution. State-of-the-art bug localization techniques concentrate on file-level coarse-grained localization by lexical matching bug reports and source code files. However, this would bring about a heavy burden for developers to locate feasible code snippets to make change with the goal of fixing the bug. Objective This paper proposes a novel approach called FineLocator to method-level fine-grained bug localization by using semantic similarity, temporal proximity and call dependency for method expansion. Method Firstly, the bug reports and the methods of source code are represented by numeric vectors using word embedding (word2vec) and the TF-IDF method. Secondly, we propose three query expansion scores as semantic similarity score, temporal proximity score and call dependency score to address the representation sparseness problem caused by the short lengths of methods in the source code. Then, the representation of a method with short length is augmented by elements of its neighboring methods with query expansion. Thirdly, when a new bug report is incoming, FineLocator will retrieve the methods in source code by similarity ranking on the bug report and the augmented methods for bug localization. Results We collect bug repositories of ArgoUML, Maven, Kylin, Ant and AspectJ projects to investigate the performance of the proposed FineLocator approach. Experimental results demonstrate that the proposed FineLocator approach can improve the performances of method-level bug localization at average by 20%, 21% and 17% measured by Top-N indicator, MAP and MRR respectively, in comparison with state-of-the-art techniques. Conclusion This is the first paper to demonstrate how to make use of method expansion to address the representation sparseness problem for method-level fine-grained bug localization.

[1]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  David Lo,et al.  Information retrieval and spectrum based bug localization: better together , 2015, ESEC/SIGSOFT FSE.

[3]  David Lo,et al.  Combined classifier for cross-project defect prediction: an extended empirical study , 2018, Frontiers of Computer Science.

[4]  Lu Zhang,et al.  Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[5]  Jens Grabowski,et al.  A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[6]  Mohamed Wiem Mkaouer,et al.  Recommending relevant classes for bug reports using multi-objective search , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[7]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Letha H. Etzkorn,et al.  Bug localization using latent Dirichlet allocation , 2010, Inf. Softw. Technol..

[10]  Peter Zoeteweij,et al.  A practical evaluation of spectrum-based fault localization , 2009, J. Syst. Softw..

[11]  Eunseok Lee,et al.  Improved bug localization based on code change histories and bug reports , 2017, Inf. Softw. Technol..

[12]  Ming Wen,et al.  Locus: Locating bugs from software changes , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  Martin Pinzger,et al.  Method-level bug prediction , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[14]  Xiao Ma,et al.  From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[15]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[16]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..

[17]  Premkumar T. Devanbu,et al.  BugCache for inspections: hit or miss? , 2011, ESEC/FSE '11.

[18]  Uirá Kulesza,et al.  The impact of refactoring changes on the SZZ algorithm: An empirical study , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Emilia Mendes,et al.  Investigating the Use of Chronological Splitting to Compare Software Cross-company and Single-company Effort Predictions: A Replicated Study , 2009, EASE.

[21]  David Lo,et al.  Version history, similar report, and structure: putting them together for improved bug localization , 2014, ICPC 2014.

[22]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[23]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[24]  Kazi Sakib,et al.  A Statement Level Bug Localization Technique using Statement Dependency Graph , 2017, ENASE.

[25]  David Lo,et al.  Inferring Links between Concerns and Methods with Multi-abstraction Vector Space Model , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[26]  Xiang Chen,et al.  FeSCH: A Feature Selection Method using Clusters of Hybrid-data for Cross-Project Defect Prediction , 2017, 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC).

[27]  Barry W. Boehm,et al.  Determining relevant training data for effort estimation using Window-based COCOMO calibration , 2019, J. Syst. Softw..

[28]  Wen Zhang,et al.  DeepRec: A deep neural network approach to recommendation with item embedding and weighted loss function , 2019, Inf. Sci..

[29]  Andrian Marcus,et al.  On the Use of Stack Traces to Improve Text Retrieval-Based Bug Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[30]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[31]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[32]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[33]  Razvan C. Bunescu,et al.  Learning to rank relevant files for bug reports using domain knowledge , 2014, SIGSOFT FSE.

[34]  Kazi Sakib,et al.  An Appropriate Method Ranking Approach for Localizing Bugs using Minimized Search Space , 2016, ENASE.

[35]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[37]  Eunseok Lee,et al.  Bug Localization Based on Code Change Histories and Bug Reports , 2015, 2015 Asia-Pacific Software Engineering Conference (APSEC).

[38]  Ye Yang,et al.  DREX: Developer Recommendation with K-Nearest-Neighbor Search and Expertise Ranking , 2011, 2011 18th Asia-Pacific Software Engineering Conference.

[39]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[40]  David Lo,et al.  Fusing multi-abstraction vector space models for concern localization , 2018, Empirical Software Engineering.

[41]  Qing Wang,et al.  BAHA: A Novel Approach to Automatic Bug Report Assignment with Topic Modeling and Heterogeneous Network Analysis , 2016 .

[42]  Xiaochen Li,et al.  Query Expansion Based on Crowd Knowledge for Code Search , 2016, IEEE Transactions on Services Computing.

[43]  Diego Calvanese,et al.  Reasoning on UML class diagrams , 2005, Artif. Intell..

[44]  Osamu Mizuno,et al.  Bug prediction based on fine-grained module histories , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[45]  David Lo,et al.  Query expansion via WordNet for effective code search , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[46]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[47]  David Lo,et al.  Theory and Practice, Do They Match? A Case with Spectrum-Based Fault Localization , 2013, 2013 IEEE International Conference on Software Maintenance.

[48]  Rui Abreu,et al.  Zoltar: a spectrum-based fault localization tool , 2009, SINTER '09.