A case study on machine learning model for code review expert system in software engineering

Code review is a key tool for quality assurance in software development. It is intended to find coding mistakes overlooked during development phase and lower risk of bugs in final product. In large and complex projects accurate code review is a challenging task. As code review depends on individual reviewer predisposition there is certain margin of source code changes that is not checked as it should. In this paper we propose machine learning approach for pointing project artifacts that are significantly at risk of failure. Planning and adjusting quality assurance (QA) activities could strongly benefit from accurate estimation of software areas endangered by defects. Extended code review could be directed there. The proposed approach has been evaluated for feasibility on large medical software project. Significant work was done to extract features from heterogeneous production data, leading to good predictive model. Our preliminary research results were considered worthy of implementation in the company where the research has been conducted, thus opening the opportunities for the continuation of the studies.

[1]  Taghi M. Khoshgoftaar,et al.  Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[2]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[3]  Bharavi Mishra,et al.  Impact of attribute selection on defect proneness prediction in OO software , 2011, 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011).

[4]  Holger Arndt The Java Data Mining Package - A Data Processing Library for Java , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.

[5]  Shah Mostafa Khaled,et al.  An attribute selection process for software defect prediction , 2014, 2014 International Conference on Informatics, Electronics & Vision (ICIEV).

[6]  Shane McIntosh,et al.  The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects , 2014, MSR 2014.

[7]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[8]  Hajimu Iida,et al.  Mining the Modern Code Review Repositories: A Dataset of People, Process and Product , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Lisa A. Curhan Software defect tracking during new product development of a computer system , 2005 .

[11]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[12]  Ian H. Witten,et al.  Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques , 2016 .