Multi-Dimension Convolutional Neural Network for Bug Localization

Software bugs remain frequent in the life cycle of software development and maintenance. Automatic localization of buggy source code files is critical for timely bug fixing and improving the efficiency of software quality assurance. Various bug localization techniques have been proposed using different dimensions of features. Recent studies have shown that different dimensions of features may play different roles in bug localization. Unfortunately, how to effectively merge these dimensions of features for improving bug localization has rarely been investigated. This paper presents a Multi-Dimension Convolutional Neural Network (MD-CNN) model for bug localization automatically based on a bug report. Our approach has dual-novelty. First, we identify and extract five statistical dimensions of features. Second, we design a Convolutional Neural Network (CNN) model that takes our five statistical dimensions of features as the input and iteratively learns the complex and non-linear relationship between the features and the bug locations. The MD-CNN bug localization model is verified using six large-scale open source projects. The experimental results show that our MD-CNN outperforms the existing representative bug localization techniques in terms of Mean Average Precision (MAP) and the number of bugs successfully localized in the top 1, 5, and 10 matched source code files.

[1]  M VoorheesEllen The TREC question answering track , 2001 .

[2]  Andreas Zeller,et al.  Where Should We Fix This Bug? A Two-Phase Recommendation Model , 2013, IEEE Transactions on Software Engineering.

[3]  Chao Liu,et al.  Statistical Debugging: A Hypothesis Testing-Based Approach , 2006, IEEE Transactions on Software Engineering.

[4]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[5]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[6]  Letha H. Etzkorn,et al.  Bug localization using latent Dirichlet allocation , 2010, Inf. Softw. Technol..

[7]  N. Cliff Ordinal methods for behavioral data analysis , 1996 .

[8]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[9]  Yujian Li,et al.  Entropy Guided Spectrum Based Bug Localization Using Statistical Language Model , 2018, ArXiv.

[10]  D. Kolesnikov Version History , 2001 .

[11]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[12]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[13]  Anh Tuan Nguyen,et al.  Bug Localization with Combination of Deep Learning and Information Retrieval , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[14]  Hung Viet Nguyen,et al.  A topic-based approach for narrowing the search space of buggy files from a bug report , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[15]  Feng Xu,et al.  Exploring Metadata in Bug Reports for Bug Localization , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[16]  Avinash C. Kak,et al.  Retrieval from software libraries for bug localization: a comparative study of generic and composite text models , 2011, MSR '11.

[17]  Andrian Marcus,et al.  On the Use of Stack Traces to Improve Text Retrieval-Based Bug Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[18]  Xia Li,et al.  Boosting spectrum-based fault localization using PageRank , 2017, ISSTA.

[19]  David Lo,et al.  AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization , 2016, J. Softw. Evol. Process..

[20]  H. Cleve,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[21]  André Bouchet,et al.  Greedy algorithm and symmetric matroids , 1987, Math. Program..

[22]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[23]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[24]  Razvan C. Bunescu,et al.  Learning to rank relevant files for bug reports using domain knowledge , 2014, SIGSOFT FSE.

[25]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[26]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[27]  Devin Chollak,et al.  Bugram: Bug detection with n-gram language models , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[28]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[29]  Zhi-Hua Zhou,et al.  Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code , 2016, IJCAI.

[30]  U. Rajendra Acharya,et al.  Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring , 2019, Appl. Soft Comput..

[31]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[32]  Xiaoyan Zhu,et al.  Does bug prediction support human developers? Findings from a Google case study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[33]  Yann-Gaël Guéhéneuc,et al.  Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[34]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[35]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[36]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[37]  U. Rajendra Acharya,et al.  DGHNL: A new deep genetic hierarchical network of learners for prediction of credit scoring , 2020, Inf. Sci..

[38]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[39]  U. Rajendra Acharya,et al.  Novel deep genetic ensemble of classifiers for arrhythmia detection using ECG signals , 2019, Neural Computing and Applications.

[40]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[41]  Razvan C. Bunescu,et al.  Mapping Bug Reports to Relevant Files: A Ranking Model, a Fine-Grained Benchmark, and Feature Evaluation , 2016, IEEE Transactions on Software Engineering.

[42]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[43]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[44]  Yan Xiao,et al.  Improving Bug Localization with an Enhanced Convolutional Neural Network , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[45]  Yu Qi,et al.  Bp Neural Network-Based Effective Fault Localization , 2009, Int. J. Softw. Eng. Knowl. Eng..

[46]  Alexander Feldman,et al.  A Two-Step Hierarchical Algorithm for Model-Based Diagnosis , 2006, AAAI.

[47]  Mangala Gowri Nanda,et al.  Fault localization for data-centric programs , 2011, ESEC/FSE '11.

[48]  David Lo,et al.  Version history, similar report, and structure: putting them together for improved bug localization , 2014, ICPC 2014.

[49]  U. Rajendra Acharya,et al.  Arrhythmia detection using deep convolutional neural network with long duration ECG signals , 2018, Comput. Biol. Medicine.

[50]  Yan Xiao,et al.  Improving bug localization with word embedding and enhanced convolutional neural networks , 2019, Inf. Softw. Technol..