Automatic Patch Linkage Detection in Code Review Using TextualContent and File Location Features

Abstract Context: Contemporary code review tools are a popular choice for software quality assurance. Using these tools, reviewers are able to post a linkage between two patches during a review discussion. Large development teams that use a review-then-commit model risk being unaware of these linkages. Objective: Our objective is to first explore how patch linkage impacts the review process. We then propose and evaluate models that detect patch linkage based on realistic time intervals. Method: First, we carry out an exploratory study on three open source projects to conduct linkage impact analysis using 942 manually classified linkages. Second, we propose two techniques using textual and file location similarity to build detection models and evaluate their performance. Results: The study provides evidence of latency in the linkage notification. We show that a patch with the Alternative Solution linkage (i.e., patches that implement similar functionality) undergoes a quicker review and avoids additional revisions after the team has been notified, compared to other linkage types. Our detection model experiments show promising recall rates for the Alternative Solution linkage (from 32% to 95%), but precision has room for improvement. Conclusion: Patch linkage detection is promising, with likely improvements if the practice of posting linkages becomes more prevalent. From our implications, this paper lays the groundwork for future research on how to increase patch linkage awareness to facilitate efficient reviews.

[1]  Alberto Bacchelli,et al.  Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[2]  N. Cliff Answering Ordinal Questions with Ordinal Data Using Ordinal Statistics. , 1996, Multivariate behavioral research.

[3]  David Ma,et al.  Expert recommendation with usage expertise , 2009, 2009 IEEE International Conference on Software Maintenance.

[4]  Christian Bird,et al.  Convergent contemporary software peer review practices , 2013, ESEC/FSE 2013.

[5]  Hajimu Iida,et al.  Review participation in modern code review , 2017, Empirical Software Engineering.

[6]  Michael W. Godfrey,et al.  Investigating technical and non-technical factors influencing modern code review , 2015, Empirical Software Engineering.

[7]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[8]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[9]  Gang Yin,et al.  A Dataset of Duplicate Pull-Requests in GitHub , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[10]  Laurence Brothers,et al.  ICICLE: groupware for code inspection , 1990, CSCW '90.

[11]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[12]  Daniel M. Germán,et al.  Contemporary Peer Review in Action: Lessons from Open Source Development , 2012, IEEE Software.

[13]  Daniel M. Germán,et al.  Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory , 2014, TSEM.

[14]  David Lo,et al.  DupFinder: integrated tool support for duplicate bug report detection , 2014, ASE.

[15]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[16]  Nicole Novielli,et al.  Confusion in Code Reviews: Reasons, Impacts, and Coping Strategies , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[17]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[18]  Audris Mockus,et al.  Companies’ Participation in OSS Development–An Empirical Study of OpenStack , 2019, IEEE Transactions on Software Engineering.

[19]  Kurt Schneider,et al.  The Choice of Code Review Process: A Survey on the State of the Practice , 2017, PROFES.

[20]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[21]  Filippo Lanubile,et al.  Group Awareness in Global Software Engineering , 2013, IEEE Software.

[22]  Adam A. Porter,et al.  Anywhere, Anytime Code Inspections: Using the Web to Remove Inspection Bottlenecks in Large-Scale Software Development , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[23]  Christian Bird,et al.  Gerrit software code review data from Android , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[24]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Andrzej Wasowski,et al.  Identifying Redundancies in Fork-based Development , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[26]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[27]  Hajimu Iida,et al.  Mining the Modern Code Review Repositories: A Dataset of People, Process and Product , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[28]  Christoph Treude,et al.  9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[29]  Shane McIntosh,et al.  The review linkage graph for code review analytics: a recovery approach and empirical study , 2019, ESEC/SIGSOFT FSE.

[30]  Vipin Balachandran,et al.  Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[31]  Wei-Tek Tsai,et al.  Distributed, collaborative software inspection , 1993, IEEE Software.

[32]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[33]  M. Mresse Comments, with reply, on 'Third-generation versus fourth-generation software development' by S. Misra and P. Jalics , 1988 .

[34]  Xin Zhang,et al.  How do Multiple Pull Requests Change the Same Code: A Study of Competing Pull Requests in GitHub , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[35]  YuYue,et al.  Reviewer recommendation for pull-requests in GitHub , 2016 .

[36]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[37]  Christoph Treude,et al.  How Modern News Aggregators Help Development Communities Shape and Share Knowledge , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[38]  David Lo,et al.  Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[39]  Ting Wang,et al.  Duplicate Pull Request Detection: When Time Matters , 2019, Internetware.

[40]  Michael E. Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[41]  Vasile Palade,et al.  Multi-Classifier Systems: Review and a roadmap for developers , 2006, Int. J. Hybrid Intell. Syst..

[42]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[43]  N. Breslow A generalized Kruskal-Wallis test for comparing K samples subject to unequal patterns of censorship , 1970 .

[44]  Shane McIntosh,et al.  Code Reviews With Divergent Review Scores: An Empirical Study of the OpenStack and Qt Communities , 2020, IEEE Transactions on Software Engineering.

[45]  Gang Yin,et al.  Detecting Duplicate Pull-requests in GitHub , 2017, Internetware.

[46]  Chanchal Kumar Roy,et al.  Mining Duplicate Questions of Stack Overflow , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[47]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[48]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[49]  Gérard Memmi,et al.  Scrutiny: A Collaborative Inspection and Review System , 1993, ESEC.

[50]  Christian Bird,et al.  Characteristics of Useful Code Reviews: An Empirical Study at Microsoft , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[51]  Luke Church,et al.  Modern Code Review: A Case Study at Google , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[52]  Hajimu Iida,et al.  Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).