DeepReview: Automatic Code Review Using Deep Multi-instance Learning

Code review, an inspection of code changes in order to identify and fix defects before integration, is essential in Software Quality Assurance (SQA). Code review is a time-consuming task since the reviewers need to understand, analysis and provide comments manually. To alleviate the burden of reviewers, automatic code review is needed. However, this task has not been well studied before. To bridge this research gap, in this paper, we formalize automatic code review as a multi-instance learning task that each change consisting of multiple hunks is regarded as a bag, and each hunk is described as an instance. We propose a novel deep learning model named DeepReview based on Convolutional Neural Network (CNN), which is an end-to-end model that learns feature representation to predict whether one change is approved or rejected. Experimental results on open source projects show that DeepReview is effective in automatic code review tasks. In terms of F1 score and AUC, DeepReview outperforms the performance of traditional single-instance based model TFIDF-SVM and the state-of-the-art deep feature based model Deeper.

[1]  Ming Li,et al.  Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code , 2017, IJCAI.

[2]  DongGyun Han,et al.  Writing Acceptable Patches: An Empirical Study of Open Source Project Patches , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[3]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[4]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[5]  Chan-Gun Lee,et al.  Applying deep learning based automatic bug triager to industrial projects , 2017, ESEC/SIGSOFT FSE.

[6]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[7]  Michael W. Godfrey,et al.  Investigating code review quality: Do people and participation matter? , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[8]  M. Uihlein Open , 2018 .

[9]  Hajimu Iida,et al.  Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[10]  Zhi-Hua Zhou,et al.  Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code , 2016, IJCAI.

[11]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[12]  Tim Menzies,et al.  On the use of relevance feedback in IR-based concept location , 2009, 2009 IEEE International Conference on Software Maintenance.

[13]  Christian Bird,et al.  Convergent contemporary software peer review practices , 2013, ESEC/FSE 2013.

[14]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[15]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[16]  Nicole Novielli,et al.  Confusion Detection in Code Reviews , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[17]  Christian Bird,et al.  Automatically Recommending Peer Reviewers in Modern Code Review , 2016, IEEE Transactions on Software Engineering.

[18]  Ming Li,et al.  Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code , 2017, IJCAI.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[21]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[22]  Zhenchang Xing,et al.  Predicting semantically linkable knowledge in developer online forums via convolutional neural network , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Daniel M. German,et al.  Open source software peer review practices , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.