Using a Balanced Scorecard to Identify Opportunities to Improve Code Review Effectiveness: An Industrial Experience Report

Peer code review is a widely adopted software engineering practice to ensure code quality and ensure software reliability in both the commercial and open-source software projects. Due to the large effort overhead associated with practicing code reviews, project managers often wonder, if their code reviews are effective and if there are improvement opportunities in that respect. Since project managers at Samsung Research Bangladesh (SRBD) were also intrigued by these questions, this research developed, deployed, and evaluated a production-ready solution using the Balanced SCorecard (BSC) strategy that SRBD managers can use in their day-to-day management to monitor individual developer’s, a particular project’s or the entire organization’s code review effectiveness. Following the fourstep framework of the BSC strategy, we– 1) defined the operation goals of this research, 2) defined a set of metrics to measure the effectiveness of code reviews, 3) developed an automated mechanism to measure those metrics, and 4) developed and evaluated a monitoring application to inform the key stakeholders. Our automated model to identify useful code reviews achieves 7.88% and 14.39% improvement in terms of accuracy and minority class F1 score respectively over the models proposed in prior studies. It also outperforms human evaluators from SRBD, that the model replaces, by a margin of 25.32% and 23.84% respectively in terms of accuracy and minority class F1 score. In our post-deployment survey, SRBD developers and managers indicated that they found our solution as useful and it provided them with important insights to help their decision makings. M. Hasan, M. Islam, and A. Iqbal Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka Bangladesh E-mail: masum@ra.cse.buet.ac.bd, rafid@openrefactory.com, anindya@cse.buet.ac.bd A.J.M. Rahman Samsung R&D Institute Bangladesh, Dhaka, Bangladesh E-mail: m.imtiaz@samsung.com Amiangshu Bosu Department of Computer Science, Wayne State University, Detroit, Michigan, USA E-mail: amiangshu.bosu@wayne.edu ar X iv :2 10 1. 10 58 5v 2 [ cs .S E ] 1 2 A ug 2 02 1 2 Masum Hasan et al.

[1]  Hajimu Iida,et al.  Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[2]  Andy Zaidman,et al.  Modern code reviews in open-source projects: which problems do they fix? , 2014, MSR 2014.

[3]  Minhaz Fahim Zibran,et al.  Leveraging Automated Sentiment Analysis in Software Engineering , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[4]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[5]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[6]  Foutse Khomh,et al.  Do faster releases improve software quality? An empirical case study of Mozilla Firefox , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[7]  George Ioannou,et al.  Implementing the Balanced Scorecard in Greece: a Software Firm's Experience , 2004 .

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  Shuvendu K. Lahiri,et al.  Helping Developers Help Themselves: Automatic Decomposition of Code Review Changesets , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[10]  Michael Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[11]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[12]  Chanchal Kumar Roy,et al.  CORRECT: Code Reviewer Recommendation in GitHub Based on Cross-Project and Technology Experience , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[13]  Mika Mäntylä,et al.  What Types of Defects Are Really Discovered in Code Reviews? , 2009, IEEE Transactions on Software Engineering.

[14]  Andrew Meneely,et al.  Do Bugs Foreshadow Vulnerabilities? A Study of the Chromium Project , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[15]  Alberto Bacchelli,et al.  Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[16]  Gilles Louppe,et al.  Independent consultant , 2013 .

[17]  R. Kaplan,et al.  The balanced scorecard--measures that drive performance. , 2015, Harvard business review.

[18]  Mohamed Wiem Mkaouer,et al.  Anti-patterns in Modern Code Review: Symptoms and Prevalence , 2021, 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).

[19]  Nicole Novielli,et al.  Sentiment Polarity Detection for Software Development , 2017, Empirical Software Engineering.

[20]  B. Flyvbjerg Five Misunderstandings About Case-Study Research , 2006, 1304.1186.

[21]  Thomas Lengauer,et al.  Classification with correlated features: unreliability of feature ranking and solutions , 2011, Bioinform..

[22]  Alberto Bacchelli,et al.  A Security Perspective on Code Review: The Case of Chromium , 2016, SCAM.

[23]  Jeffrey C. Carver,et al.  Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft , 2017, IEEE Transactions on Software Engineering.

[24]  Andy Neely,et al.  Automating the balanced scorecard – selection criteria to identify appropriate software applications , 2003 .

[25]  Chanchal Kumar Roy,et al.  Predicting Usefulness of Code Review Comments Using Textual Features and Developer Experience , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[26]  V. S. Mani,et al.  Fostering a High-Performance Culture in Offshore Software Engineering Teams Using Balanced Scorecards and Project Scorecards , 2011, 2011 IEEE Sixth International Conference on Global Software Engineering.

[27]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[28]  Christian Bird,et al.  Characteristics of Useful Code Reviews: An Empirical Study at Microsoft , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[29]  Luke Church,et al.  Modern Code Review: A Case Study at Google , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[30]  James D. Herbsleb,et al.  Impression formation in online peer production: activity traces and personal profiles in github , 2013, CSCW.

[31]  Nicole Novielli,et al.  A Benchmark Study on Sentiment Analysis for Software Engineering Research , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[32]  Christian Bird,et al.  Convergent contemporary software peer review practices , 2013, ESEC/FSE 2013.

[33]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[34]  Michael W. Godfrey,et al.  Code Review Quality: How Developers See It , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[35]  J.J. Hopfield,et al.  Artificial neural networks , 1988, IEEE Circuits and Devices Magazine.

[36]  Michael E. Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[37]  Yuki Ueda,et al.  The Impact of a Low Level of Agreement Among Reviewers in a Code Review Process , 2016, OSS.

[38]  Nicole Novielli,et al.  Confusion in Code Reviews: Reasons, Impacts, and Coping Strategies , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[39]  Hajimu Iida,et al.  Review participation in modern code review , 2017, Empirical Software Engineering.

[40]  Katsuro Inoue,et al.  WhoReview: A multi-objective search-based approach for code reviewers recommendation in modern code review , 2021, Appl. Soft Comput..

[41]  Jacek Dajda,et al.  Developers' Game: A Preliminary Study Concerning a Tool for Automated Developers Assessment , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[42]  Anindya Iqbal,et al.  Review4Repair: Code Review Aided Automatic Program Repairing , 2020, Inf. Softw. Technol..

[43]  Jeffrey C. Carver,et al.  Impact of Peer Code Review on Peer Impression Formation: A Survey , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[44]  Anindya Iqbal,et al.  SentiCR: A customized sentiment analysis tool for code review interactions , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[45]  Chen Jin,et al.  An improved ID3 decision tree algorithm , 2009, 2009 4th International Conference on Computer Science & Education.

[46]  S. Iliffe,et al.  Bmc Medical Research Methodology Open Access the Hawthorne Effect: a Randomised, Controlled Trial , 2007 .

[47]  Audris Mockus,et al.  A case study of open source software development: the Apache server , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[48]  Michael W. Godfrey,et al.  Investigating code review quality: Do people and participation matter? , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[49]  Steven Mair,et al.  A Balanced Scorecard for a Small Software Group , 2002, IEEE Softw..

[50]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[51]  Les Hatton,et al.  Testing the Value of Checklists in Code Inspections , 2008, IEEE Software.

[52]  Shane McIntosh,et al.  The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects , 2014, MSR 2014.

[53]  Forrest Shull,et al.  Building Knowledge through Families of Experiments , 1999, IEEE Trans. Software Eng..

[54]  Ahmed E. Hassan,et al.  Review Dynamics and Their Impact on Software Quality , 2021, IEEE Transactions on Software Engineering.

[55]  Jacek Czerwonka,et al.  Code Reviews Do Not Find Bugs. How the Current Code Review Best Practice Slows Us Down , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.