Automatic Decomposition of Java Open Source Pull Requests: A Replication Study

The presence of large changesets containing several independent modifications (e.g. bug fixes, refactorings, features) can negatively affect the efficacy of code review. To cope with this problem, Barnett et al. developed ClusterChanges — a lightweight static analysis technique for decomposing changesets in different partitions that can be reviewed independently. They have found that ClusterChanges can indeed decompose such changesets and that developers agree with their decomposition. However, the authors’ restricted their analysis to software that is: (i) closed source, (ii) written in C# and (iii) developed by a single organization. To address this threat to validity, we implemented JClusterChanges, a free and open source (FOSS) implementation of ClusterChanges for Java software, and replicated the original Barnett et al. study using changesets from Java open source projects hosted on GitHub. We found that open source changesets are similar to the changesets of the original study. Thus, our research confirms that the problem is relevant to other environments and provides a FOSS implementation of ClusterChanges that not only shows the feasibility of the technique in other contexts but can also be used to help future research.

[1]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[2]  Stefan Biffl,et al.  Software Reviews: The State of the Practice , 2003, IEEE Softw..

[3]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[4]  Dongmei Zhang,et al.  How do software engineers understand code changes?: an exploratory study in industry , 2012, SIGSOFT FSE.

[5]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[6]  Daniel M. Germán,et al.  Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory , 2014, TSEM.

[7]  Natalia Juristo Juzgado,et al.  Understanding replication of experiments in software engineering: A classification , 2014, Inf. Softw. Technol..

[8]  Alberto Bacchelli,et al.  Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  Lasse Harjumaa,et al.  Peer reviews in real life - motivators and demotivators , 2005, Fifth International Conference on Quality Software (QSIC'05).

[10]  Georgios Gousios,et al.  Untangling fine-grained code changes , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[11]  Anh Tuan Nguyen,et al.  Filtering noise in mixed-purpose fixing commits to improve defect prediction and localization , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[12]  Forrest Shull,et al.  Building Knowledge through Families of Experiments , 1999, IEEE Trans. Software Eng..

[13]  Andreas Zeller,et al.  The impact of tangled code changes , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[14]  Stéphane Ducasse,et al.  Do Tools Support Code Integration? A Survey , 2016, J. Object Technol..

[15]  Sunghun Kim,et al.  Partitioning Composite Code Changes to Facilitate Code Review , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[16]  Forrest Shull,et al.  Inspecting the History of Inspections: An Example of Evidence-Based Technology Diffusion , 2008, IEEE Software.

[17]  Shinji Kusumoto,et al.  Hey! are you committing tangled changes? , 2014, ICPC 2014.

[18]  Shuvendu K. Lahiri,et al.  Helping Developers Help Themselves: Automatic Decomposition of Code Review Changesets , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[19]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.