How to Explain a Patch: An Empirical Study of Patch Explanations in Open Source Projects

Abstract-Bugs are inevitable in software development and maintenance processes. Recently a lot of research efforts have been devoted to automatic program repair, aiming to reduce the efforts of debugging. However, since it is difficult to ensure that the generated patches meet all quality requirements such as correctness, developers still need to review the patch. In addition, current techniques produce only patches without explanation, making it difficult for the developers to understand the patch. Therefore, we believe a more desirable approach should generate not only the patch but also an explanation of the patch. To generate a patch explanation, it is important to first understand how patches were explained. In this paper, we explored how developers explain their patches by manually analyzing 300 merged bug-fixing pull requests from six projects on GitHub. Our contribution is twofold. First, we build a patch explanation model, which summarizes the elements in a patch explanation, and corresponding expressive forms. Second, we conducted a quantitative analysis to understand the distributions of elements, and the correlation between elements and their expressive forms.

[1]  Collin McMillan,et al.  Automatic Source Code Summarization of Context for Java Methods , 2016, IEEE Transactions on Software Engineering.

[2]  Chanchal Kumar Roy,et al.  An insight into the pull requests of GitHub , 2014, MSR 2014.

[3]  DongGyun Han,et al.  Writing Acceptable Patches: An Empirical Study of Open Source Project Patches , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[4]  Mario Linares Vásquez,et al.  On Automatically Generating Commit Messages via Summarization of Source Code Changes , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[5]  E. Burton Swanson,et al.  Characteristics of application software maintenance , 1978, CACM.

[6]  Gang Yin,et al.  Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? , 2016, Inf. Softw. Technol..

[7]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[8]  Seung-won Hwang,et al.  Enriching Documents with Examples: A Corpus Mining Approach , 2013, TOIS.

[9]  Westley Weimer,et al.  Automatically documenting program changes , 2010, ASE.

[10]  Dongmei Zhang,et al.  How do software engineers understand code changes?: an exploratory study in industry , 2012, SIGSOFT FSE.

[11]  Liang Chen,et al.  EARec: Leveraging Expertise and Authority for Pull-Request Reviewer Recommendation in GitHub , 2016, 2016 IEEE/ACM 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE).

[12]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[13]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[14]  Martin Monperrus,et al.  Automatic Software Repair , 2018, ACM Comput. Surv..

[15]  Collin McMillan,et al.  Towards Automatic Generation of Short Summaries of Commits , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[16]  Gail C. Murphy,et al.  Why did this code change? , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[17]  Gabriele Bavota,et al.  ARENA: An Approach for the Automated Generation of Release Notes , 2017, IEEE Transactions on Software Engineering.

[18]  Lingming Zhang,et al.  Practical program repair via bytecode mutation , 2018, ISSTA.

[19]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[20]  Xiaoping Fan,et al.  Topic-Based Integrator Matching for Pull Request , 2017, GLOBECOM 2017 - 2017 IEEE Global Communications Conference.

[21]  Martin Monperrus,et al.  Explainable Software Bot Contributions: Case Study of Automated Bug Fixes , 2019, 2019 IEEE/ACM 1st International Workshop on Bots in Software Engineering (BotSE).

[22]  Lori L. Pollock,et al.  Generating Parameter Comments and Integrating with Method Summaries , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[23]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[24]  Matias Martinez,et al.  Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset , 2016, Empirical Software Engineering.

[25]  Lin Tan,et al.  CloCom: Mining existing source code for automatic comment generation , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[26]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[27]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[28]  Gang Yin,et al.  Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[29]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[30]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[31]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[32]  David Lo,et al.  Practitioners' expectations on automated fault localization , 2016, ISSTA.

[33]  Claire Le Goues,et al.  Automated program repair , 2019, Commun. ACM.

[34]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[35]  Jia-Huan He,et al.  Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development , 2017, Inf. Softw. Technol..

[36]  Leonardo Gresta Paulino Murta,et al.  Acceptance factors of pull requests in open-source projects , 2015, SAC.

[37]  Hiroaki Yoshida,et al.  Anti-patterns in search-based program repair , 2016, SIGSOFT FSE.

[38]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[39]  Xiaonan Luo,et al.  Mining Version Control System for Automatically Generating Commit Comment , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[40]  Premkumar T. Devanbu,et al.  Wait for It: Determinants of Pull Request Evaluation Latency on GitHub , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[41]  Jeffrey C. Carver,et al.  Evaluating source code summarization techniques: Replication and expansion , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[42]  Emerson Murphy-Hill,et al.  Gender differences and bias in open source: pull request acceptance of women versus men , 2017, PeerJ Comput. Sci..

[43]  Gang Yin,et al.  Reviewer Recommender of Pull-Requests in GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[44]  Martin P. Robillard,et al.  Recovering traceability links between an API and its learning resources , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[45]  Gerardo Canfora,et al.  Mining source code descriptions from developer communications , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).