CooBa: Cross-project Bug Localization via Adversarial Transfer Learning

Bug localization plays an important role in software quality control. Many supervised machine learning models have been developed based on historical bug-fix information. Despite being successful, these methods often require sufficient historical data (i.e., labels), which is not always available especially for newly developed software projects. In response, cross-project bug localization techniques have recently emerged whose key idea is to transferring knowledge from label-rich source project to locate bugs in the target project. However, a major limitation of these existing techniques lies in that they fail to capture the specificity of each individual project, and are thus prone to negative transfer. To address this issue, we propose an adversarial transfer learning bug localization approach, focusing on only transferring the common characteristics (i.e., public information) across projects. Specifically, our approach (COOBA) learns the indicative public information from cross-project bug reports through a shared encoder, and extracts the private information from code files by an individual feature extractor for each project. COOBA further incorporates adversarial learning to ensure that public information shared between multiple projects could be effectively extracted. Extensive experiments on four large-scale real-world data sets demonstrate that the proposed COOBA significantly outperforms the state of the art techniques.

[1]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[2]  Jeffrey S. Foster,et al.  Understanding source code evolution using abstract syntax tree matching , 2005, MSR.

[3]  Jose M. Such,et al.  International Joint Conference on Artificial Intelligence (IJCAI) , 2016 .

[4]  Andreas Zeller,et al.  Where Should We Fix This Bug? A Two-Phase Recommendation Model , 2013, IEEE Transactions on Software Engineering.

[5]  Automated Software Engineering , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[6]  Chanchal Kumar Roy,et al.  Improving IR-based bug localization with context-aware query reformulation , 2018, ESEC/SIGSOFT FSE.

[7]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[8]  Linmei Hu,et al.  Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification , 2019, EMNLP.

[9]  Xindong Wu,et al.  Proceedings of the 10th IEEE International Conference on Data Mining (ICDM) , 2010 .

[10]  Dong Liu,et al.  Unsupervised Deep Bug Report Summarization , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[11]  Proceedings of the 26th Conference on Program Comprehension , 2018, ICPC.

[12]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[13]  Ming Li,et al.  Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code , 2017, IJCAI.

[14]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[15]  Zhi-Hua Zhou,et al.  Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code , 2016, IJCAI.

[16]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[17]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[18]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[19]  Miltiadis Allamanis,et al.  Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering , 2014 .

[20]  Carl Vogel,et al.  Proceedings of the 16th International Conference on Computational Linguistics , 1996, COLING 1996.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Zhiyuan Liu,et al.  Adversarial Multi-lingual Neural Relation Extraction , 2018, COLING.

[23]  David Budgen,et al.  Empirical Software Engineering , 2014, Computing Handbook, 3rd ed..

[24]  David Lo,et al.  Deep Transfer Bug Localization , 2019, IEEE Transactions on Software Engineering.

[25]  L. Carvajal,et al.  IEEE Transactions on Software Engineering , 2016 .

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Ali Selamat,et al.  Information and Software Technology , 2014 .

[28]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[29]  Anh Tuan Nguyen,et al.  Bug Localization with Combination of Deep Learning and Information Retrieval , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[30]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[31]  Harald C. Gall,et al.  Proceedings of the 2006 international workshop on Mining software repositories , 2006, International Conference on Software Engineering.