Why Developers Refactor Source Code: A Mining-based Study

Refactoring aims at improving code non-functional attributes without modifying its external behavior. Previous studies investigated the motivations behind refactoring by surveying developers. With the aim of generalizing and complementing their findings, we present a large-scale study quantitatively and qualitatively investigating why developers perform refactoring in open source projects. First, we mine 287,813 refactoring operations performed in the history of 150 systems. Using this dataset, we investigate the interplay between refactoring operations and process (e.g., previous changes/fixes) and product (e.g., quality metrics) metrics. Then, we manually analyze 551 merged pull requests implementing refactoring operations, and classify the motivations behind the implemented refactorings (e.g., removal of code duplication). Our results led to (i) quantitative evidence of the relationship existing between certain process/product metrics and refactoring operations; and (ii) a detailed taxonomy, generalizing and complementing the ones existing in the literature, of motivations pushing developers to refactor source code.

[1]  Gabriele Bavota,et al.  Towards Just-in-Time Refactoring Recommenders , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[2]  David W. Binkley,et al.  Expanding identifiers to normalize source code vocabulary , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[3]  Gabriele Bavota,et al.  On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation , 2018, Empirical Software Engineering.

[4]  Alessandro F. Garcia,et al.  How does refactoring affect internal quality attributes?: A multi-project study , 2017, SBES'17.

[5]  Mohammad Alshayeb,et al.  Empirical investigation of refactoring effect on software quality , 2009, Inf. Softw. Technol..

[6]  Vittorio Cortellessa,et al.  Antipattern-based model refactoring for software performance improvement , 2012, QoSA '12.

[7]  Danny Dig A Refactoring Approach to Parallelism , 2011, IEEE Software.

[8]  Xin Yao,et al.  Software Module Clustering as a Multi-Objective Search Problem , 2011, IEEE Transactions on Software Engineering.

[9]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[10]  Andrew P. Black,et al.  How we refactor, and how we know it , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[11]  Gabriele Bavota,et al.  Investigating the Use of Code Analysis and NLP to Promote a Consistent Usage of Identifiers , 2017, 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[12]  Martin P. Robillard,et al.  Improving API Usage through Automatic Detection of Redundant Code , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[13]  Mark Harman,et al.  An Empirical Study of Cohesion and Coupling: Balancing Optimization and Disruption , 2018, IEEE Transactions on Evolutionary Computation.

[14]  Brian Henderson-Sellers,et al.  Coupling and cohesion (towards a valid metrics suite for object-oriented analysis and design) , 1996, Object Oriented Syst..

[15]  Nikolaos Tsantalis,et al.  Unification and refactoring of clones , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[16]  Rudolf Ferenc,et al.  Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems , 2008, IEEE Transactions on Software Engineering.

[17]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[18]  Gabriele Bavota,et al.  On Learning Meaningful Code Changes Via Neural Machine Translation , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[19]  Sarah Nadi,et al.  Are Refactorings to Blame? An Empirical Study of Refactorings in Merge Conflicts , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[20]  Gabriele Bavota,et al.  An experimental investigation on the innate relationship between quality and refactoring , 2015, J. Syst. Softw..

[21]  Markus Pizka,et al.  Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[22]  Denys Poshyvanyk,et al.  A comprehensive model for code readability , 2018, J. Softw. Evol. Process..

[23]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[24]  Gabriele Bavota,et al.  An empirical study on the developers' perception of software coupling , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[25]  Baldoino Fonseca dos Santos Neto,et al.  Understanding the impact of refactoring on smells: a longitudinal study of 23 software projects , 2017, ESEC/SIGSOFT FSE.

[26]  Gabriele Bavota,et al.  When Does a Refactoring Induce Bugs? An Empirical Study , 2012, 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation.

[27]  Gabriele Bavota,et al.  Automating extract class refactoring: an improved method and its evaluation , 2013, Empirical Software Engineering.

[28]  Phil McMinn,et al.  Supervised software modularisation , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[29]  Marco Tulio Valente,et al.  Static correspondence and correlation between field defects and warnings reported by a bug finding tool , 2011, Software Quality Journal.

[30]  Yann-Gaël Guéhéneuc,et al.  DECOR: A Method for the Specification and Detection of Code and Design Smells , 2010, IEEE Transactions on Software Engineering.

[31]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[32]  Gabriele Bavota,et al.  Using structural and semantic measures to improve software modularization , 2012, Empirical Software Engineering.

[33]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[34]  Alexander Chatzigeorgiou,et al.  Identification of Move Method Refactoring Opportunities , 2009, IEEE Transactions on Software Engineering.

[35]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[36]  Stefan Hanenberg,et al.  Test-Driven Code Review: An Empirical Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[37]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[38]  António Menezes Leitão Detection of Redundant Code Using R2D2 , 2004, Software Quality Journal.

[39]  Davood Mazinanian,et al.  Clone Refactoring with Lambda Expressions , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[40]  Tibor Gyimóthy,et al.  Bulk Fixing Coding Issues and Its Effects on Software Quality: Is It Worth Refactoring? , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[41]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[42]  Michael D. Ernst,et al.  Which warnings should I fix first? , 2007, ESEC-FSE '07.

[43]  Denys Poshyvanyk,et al.  The conceptual cohesion of classes , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[44]  Diomidis Spinellis,et al.  Refactoring--Does It Improve Software Quality? , 2007, Fifth International Workshop on Software Quality (WoSQ'07: ICSE Workshops 2007).

[45]  Charles A. Sutton,et al.  Learning natural coding conventions , 2014, SIGSOFT FSE.

[46]  Marco Tulio Valente,et al.  Why we refactor? confessions of GitHub contributors , 2016, SIGSOFT FSE.

[47]  Yi Wang,et al.  What motivate software engineers to refactor source code? evidences from professional developers , 2009, 2009 IEEE International Conference on Software Maintenance.

[48]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[49]  Gabriele Bavota,et al.  Recommending Refactoring Operations in Large Software Systems , 2014, Recommendation Systems in Software Engineering.

[50]  Venera Arnaoudova,et al.  The Effect of Poor Source Code Lexicon and Readability on Developers' Cognitive Load , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[51]  David Hovemeyer,et al.  Tracking defect warnings across versions , 2006, MSR '06.

[52]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[53]  ZimmermannThomas,et al.  An Empirical Study of RefactoringChallenges and Benefits at Microsoft , 2014 .

[54]  Mohamed Wiem Mkaouer,et al.  An empirical investigation of how and why developers rename identifiers , 2018, IWoR@ASE.

[55]  David W. Binkley,et al.  Syntactic Identifier Conciseness and Consistency , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[56]  Gabriele Bavota Using structural and semantic information to support software refactoring , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[57]  Arie van Deursen,et al.  Refactoring test code , 2001 .

[58]  Ying Zhang,et al.  Refactoring android Java code for on-demand computation offloading , 2012, OOPSLA '12.

[59]  Gabriele Bavota,et al.  When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away) , 2015, IEEE Transactions on Software Engineering.

[60]  Yu Chin Cheng,et al.  Elsevier Editorial System(tm) for Journal of Systems and Software Manuscript Draft Exception Handling Refactorings: Directed by Goals and Driven by Bug Fixing , 2022 .

[61]  Eleni Stroulia,et al.  JDeodorant: identification and application of extract class refactorings , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[62]  Harald C. Gall,et al.  A large-scale empirical exploration on refactoring activities in open source software projects , 2019, Sci. Comput. Program..

[63]  Bertrand Meyer,et al.  Balancing Agility and Formalism in Software Engineering: Second IFIP TC 2 Central and East Conference on Software Engineering Techniques, CEE-SET 2007 , 2008 .

[64]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[65]  Gabriele Bavota,et al.  Anti-Pattern Detection: Methods, Challenges, and Open Issues , 2015, Adv. Comput..

[66]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[67]  Chanchal Kumar Roy Large scale clone detection, analysis, and benchmarking: An evolutionary perspective (Keynote) , 2018, IWSC.

[68]  Westley Weimer,et al.  Learning a Metric for Code Readability , 2010, IEEE Transactions on Software Engineering.

[69]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[70]  Yann-Gaël Guéhéneuc,et al.  A New Family of Software Anti-patterns: Linguistic Anti-patterns , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[71]  Danny Dig,et al.  Accurate and Efficient Refactoring Detection in Commit History , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[72]  David W. Binkley,et al.  Effective identifier names for comprehension and memory , 2007, Innovations in Systems and Software Engineering.

[73]  Gerardo Canfora,et al.  How changes affect software entropy: an empirical study , 2014, Empirical Software Engineering.

[74]  Miryung Kim,et al.  A field study of refactoring challenges and benefits , 2012, SIGSOFT FSE.