How we refactor, and how we know it

Much of what we know about how programmers refactor in the wild is based on studies that examine just a few software projects. Researchers have rarely taken the time to replicate these studies in other contexts or to examine the assumptions on which they are based. To help put refactoring research on a sound scientific basis, we draw conclusions using four data sets spanning more than 13 000 developers, 240 000 tool-assisted refactorings, 2500 developer hours, and 3400 version control commits. Using these data, we cast doubt on several previously stated assumptions about how programmers refactor, while validating others. For example, we find that programmers frequently do not indicate refactoring activity in commit logs, which contradicts assumptions made by several previous researchers. In contrast, we were able to confirm the assumption that programmers do frequently intersperse refactoring with other program changes. By confirming assumptions and replicating studies made by other researchers, we can have greater confidence that those researchers' conclusions are generalizable.

[1]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.

[2]  Markus Pizka,et al.  Straightening Spaghetti-Code with Refactoring? , 2004, Software Engineering Research and Practice.

[3]  Zhenchang Xing,et al.  Refactoring Practice: How it is and How it Should be Supported - An Eclipse Case Study , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[4]  Stephan Diehl,et al.  Are refactorings less error-prone than other changes? , 2006, MSR '06.

[5]  Diomidis Spinellis,et al.  Refactoring--Does It Improve Software Quality? , 2007, Fifth International Workshop on Software Quality (WoSQ'07: ICSE Workshops 2007).

[6]  Jim Welsh,et al.  Systematic evaluation of design choices for software development tools , 1998, Softw. Concepts Tools.

[7]  C MurphyGail,et al.  How Are Java Software Developers Using the Eclipse IDE , 2006 .

[8]  Rudolf K. Keller,et al.  High-impact Refactoring Based on Architecture Violations , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[9]  George Loizou,et al.  Common refactorings, a dependency graph and some code smells: an empirical study of Java OSS , 2006, ISESE '06.

[10]  Romain Robbes Mining a Change-Based Software Repository , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[11]  Andrew P. Black,et al.  Breaking the barriers to successful refactoring: observations and tools for extract method , 2008, ICSE.

[12]  Emilia Mendes,et al.  Trends in Java code changes: the key to identification of refactorings? , 2003, PPPJ.

[13]  Andrew P. Black,et al.  High velocity refactorings in Eclipse , 2007, eclipse '07.

[14]  Ralph E. Johnson,et al.  Automated Detection of Refactorings in Evolving Components , 2006, ECOOP.

[15]  Mik Kersten,et al.  How are Java software developers using the Elipse IDE? , 2006, IEEE Software.

[16]  E. Murphy-Hill,et al.  Refactoring Tools: Fitness for Purpose , 2006, IEEE Software.

[17]  Harald C. Gall,et al.  On the relation of refactorings and software defect prediction , 2008, MSR '08.

[18]  Daniel M. Germán,et al.  What do large commits tell us?: a taxonomical study of large commits , 2008, MSR '08.