How We Refactor and How We Mine it ? A Large Scale Study on Refactoring Activities in Open Source Systems

Refactoring, as coined by WIlliam Obdyke in 1992, is the art of optimizing the syntactic design of a software system without altering its external behavior. Refactoring was also cataloged by Martin Fowler as a response to the existence of design defects that negatively impact the software’s design. Since then, the research in refactoring has been driven by improving systems structures. However, recent studies have been showing that developers may incorporate refactoring strategies in other development related activities that go beyond improving the design. In this context, we aim in better understanding the developer’s perception of refactoring by mining and automatically classifying refactoring activities in 1,706 open source Java projects. We perform a differentiated replication of the pioneering work by Tsantalis et al. We revisit five research questions presented in this previous empirical study and compare our results to their original work. The original study investigates various types of refactorings applied to different source types (i.e., production vs. test), the degree to which experienced developers contribute to refactoring efforts, the chronological collocation of refactoring with release and testing periods, and the developer’s intention behind specific types of refactorings. We reexamine the same questions but on a larger number of systems. To do this, our approach relies on mining refactoring instances executed throughout several releases of each project we studied. We also mined several properties related to these projects; namely their commits, contributors, issues, test files, etc. Our findings confirm some of the results of the previous study and we highlight some differences for discussion. We found that 1) feature addition and bug fixes are strong motivators for developers to refactor their code base, rather than the traditional design improvement motivation; 2) a variety of refactoring types are applied when refactoring both production and test code. 3) refactorings tend to be applied by experienced developers who have contributed a wide range of commits to the code. 4) there is a correlation between the type of refactoring activities taking place and whether the source code is undergoing a release or a test period.

[1]  Gabriele Bavota,et al.  A Large-Scale Empirical Study on Self-Admitted Technical Debt , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[2]  Andrew P. Black,et al.  Refactoring Tools: Fitness for Purpose , 2008, IEEE Software.

[3]  Andrea De Lucia,et al.  An Exploratory Study on the Relationship between Changes and Refactoring , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[4]  Charles A. Sutton,et al.  Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[5]  Michele Lanza,et al.  On the nature of commits , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops.

[6]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[7]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[8]  Ralph E. Johnson,et al.  Automated Detection of Refactorings in Evolving Components , 2006, ECOOP.

[9]  Daniel M. Germán,et al.  What do large commits tell us?: a taxonomical study of large commits , 2008, MSR '08.

[10]  Stephan Diehl,et al.  Comparison of similarity metrics for refactoring detection , 2011, MSR '11.

[11]  Shinji Kusumoto,et al.  Hey! are you committing tangled changes? , 2014, ICPC 2014.

[12]  Eleni Stroulia,et al.  A multidimensional empirical study on refactoring activity , 2013, CASCON.

[13]  Danny Dig,et al.  Accurate and Efficient Refactoring Detection in Commit History , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[14]  Miryung Kim,et al.  Ref-Finder: a refactoring reconstruction tool based on logic query templates , 2010, FSE '10.

[15]  Andrew P. Black,et al.  How We Refactor, and How We Know It , 2012, IEEE Trans. Software Eng..

[16]  Ling Xu,et al.  Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project , 2016, J. Syst. Softw..

[17]  Stephan Diehl,et al.  Highly Configurable and Extensible Code Clone Detection , 2010, 2010 17th Working Conference on Reverse Engineering.

[18]  Marco Tulio Valente,et al.  Why we refactor? confessions of GitHub contributors , 2016, SIGSOFT FSE.

[19]  Siau-Cheng Khoo,et al.  Semantic patch inference , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[20]  Sumit Gulwani,et al.  Learning Syntactic Program Transformations from Examples , 2016, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[21]  Gregorio Robles,et al.  Discriminating Development Activities in Versioning Systems : A Case Study ∗ , 2006 .

[22]  L. Erlikh,et al.  Leveraging legacy system dollars for e-business , 2000 .

[23]  Emad Shihab,et al.  An Exploratory Study on Self-Admitted Technical Debt , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[24]  Thomas Grechenig,et al.  Dataset of Developer-Labeled Commit Messages , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[25]  Eleni Stroulia,et al.  UMLDiff: an algorithm for object-oriented design differencing , 2005, ASE.

[26]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[27]  Miryung Kim,et al.  An Empirical Study of RefactoringChallenges and Benefits at Microsoft , 2014, IEEE Transactions on Software Engineering.

[28]  Ahmed E. Hassan,et al.  Automated classification of change messages in open source projects , 2008, SAC '08.

[29]  Amiram Yehudai,et al.  Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes , 2017, PROMISE.

[30]  Anas Abdin,et al.  Empirical Evaluation of the Impact of Object-Oriented Code Refactoring on Quality Attributes: A Systematic Literature Review , 2018, IEEE Transactions on Software Engineering.

[31]  Collin McMillan,et al.  Categorizing software applications for maintenance , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[32]  Rusli Abdullah,et al.  Text-based classification incoming maintenance requests to maintenance type , 2010, 2010 International Symposium on Information Technology.

[33]  Michael W. Godfrey,et al.  Automated topic naming to support cross-project analysis of software maintenance activities , 2011, MSR '11.

[34]  Thomas Grechenig,et al.  Tracing Your Maintenance Work - A Cross-Project Validation of an Automated Classification Dictionary for Commit Messages , 2012, FASE.

[35]  Patricia J. Guinan,et al.  Enabling Software Development Team Performance During Requirements Definition: A Behavioral Versus Technical Approach , 1998, Inf. Syst. Res..

[36]  Marco Tulio Valente,et al.  RefDiff: Detecting Refactorings in Version Histories , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[37]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[38]  Stephan Diehl,et al.  Identifying Refactorings from Source-Code Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[39]  David Lo,et al.  Automatic Fine-Grained Issue Report Reclassification , 2014, 2014 19th International Conference on Engineering of Complex Computer Systems.

[40]  Eleni Stroulia,et al.  The JDEvAn tool suite in support of object-oriented evolutionary development , 2008, ICSE Companion '08.

[41]  Alessandro F. Garcia,et al.  How does refactoring affect internal quality attributes?: A multi-project study , 2017, SBES'17.

[42]  Miryung Kim,et al.  Template-based reconstruction of complex refactorings , 2010, 2010 IEEE International Conference on Software Maintenance.

[43]  E. Burton Swanson,et al.  The dimensions of maintenance , 1976, ICSE '76.

[44]  Yi Wang,et al.  What motivate software engineers to refactor source code? evidences from professional developers , 2009, 2009 IEEE International Conference on Software Maintenance.

[45]  Shinpei Hayashi,et al.  Search-Based Refactoring Detection from Source Code Revisions , 2010, IEICE Trans. Inf. Syst..

[46]  Michael W. Godfrey,et al.  Automatic classication of large changes into maintenance categories , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[47]  Miryung Kim,et al.  Lase: Locating and applying systematic edits by learning from examples , 2013, 2013 35th International Conference on Software Engineering (ICSE).