Scalable detection of missed cross-function refactorings

Refactoring is an important way to improve the design of existing code. Identifying refactoring opportunities (i.e., code fragments that can be refactored) in large code bases is a challenging task. In this paper, we propose a novel, automated and scalable technique for identifying cross-function refactoring opportunities that span more than one function (e.g., Extract Method and Inline Method). The key of our technique is the design of efficient vector inlining operations that emulate the effect of method inlining among code fragments, so that the problem of identifying cross-function refactoring can be reduced to the problem of finding similar vectors before and after inlining. We have implemented our technique in a prototype tool named ReDex which encodes Java programs to particular vectors. We have applied the tool to a large code base, 4.5 million lines of code, comprising of 200 bundle projects in the Eclipse ecosystem (e.g., Eclipse JDT, Eclipse PDE, Apache Commons, Hamcrest, etc.). Also, different from many other studies on detecting refactoring, ReDex only searches for code fragments that can be, but have not yet been, refactored in a way similar to some refactoring that happened in the code base. Our results show that ReDex can find 277 cross-function refactoring opportunities in 2 minutes, and 223 cases were labelled as true opportunities by users, and cover many categories of cross-function refactoring operations in classical refactoring books, such as Self Encapsulate Field, Decompose Conditional Expression, Hide Delegate, Preserve Whole Object, etc.

[1]  Dongmei Zhang,et al.  XIAO: tuning code clones at hands of engineers in practice , 2012, ACSAC '12.

[2]  Danny Dig,et al.  LambdaFicator: From imperative to functional programming through automated refactoring , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[3]  Stephan Diehl,et al.  Identifying Refactorings from Source-Code Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[4]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[5]  Susan Horwitz,et al.  Semantics-preserving procedure extraction , 2000, POPL '00.

[6]  Miryung Kim,et al.  Template-based reconstruction of complex refactorings , 2010, 2010 IEEE International Conference on Software Maintenance.

[7]  Tao Xie,et al.  Automated detection of api refactorings in libraries , 2007, ASE '07.

[8]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[9]  Shinji Kusumoto,et al.  ARIES: refactoring support tool for code clone , 2005, ACM SIGSOFT Softw. Eng. Notes.

[10]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[11]  Robert Tairas,et al.  Clone detection and refactoring , 2006, OOPSLA '06.

[12]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[13]  Adam Kiezun,et al.  Advanced Refactoring in the Eclipse JDT: Past, Present, and Future , 2007, WRT.

[14]  Daniel J. Quinlan,et al.  Detecting code clones in binary executables , 2009, ISSTA.

[15]  Ralf Lämmel,et al.  Towards generic refactoring , 2002, RULE '02.

[16]  Zhendong Su,et al.  Automatic mining of functionally equivalent code fragments via random testing , 2009, ISSTA.

[17]  Shinpei Hayashi,et al.  Detecting Occurrences of Refactoring with Heuristic Search , 2008, 2008 15th Asia-Pacific Software Engineering Conference.

[18]  Christopher W. Pidgeon,et al.  DMS®: Program Transformations for Practical Scalable Software Evolution , 2002, IWPSE '02.

[19]  Frank Tip,et al.  Correct Refactoring of Concurrent Java Code , 2010, ECOOP.

[20]  Serge Demeyer,et al.  An Initial Investigation into Change-Based Reconstruction of Floss-Refactorings , 2013, 2013 IEEE International Conference on Software Maintenance.

[21]  Heejung Kim,et al.  MeCC: memory comparison-based clone detector , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[22]  Frank Tip,et al.  Refactoring using type constraints , 2011, TOPL.

[23]  Jose Meseguer,et al.  Formal Specification and Verification of Java Refactorings , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[24]  Miryung Kim,et al.  Ref-Finder: a refactoring reconstruction tool based on logic query templates , 2010, FSE '10.

[25]  Ralph E. Johnson,et al.  Automated Detection of Refactorings in Evolving Components , 2006, ECOOP.

[26]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[27]  Serge Demeyer,et al.  Evaluating clone detection techniques from a refactoring perspective , 2004, Proceedings. 19th International Conference on Automated Software Engineering, 2004..

[28]  I.D. Baxter,et al.  DMS/spl reg/: program transformations for practical scalable software evolution , 2004, Proceedings. 26th International Conference on Software Engineering.

[29]  Oege de Moor,et al.  JunGL: a scripting language for refactoring , 2006, ICSE.

[30]  Miryung Kim,et al.  Lase: Locating and applying systematic edits by learning from examples , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[31]  Chanchal Kumar Roy,et al.  SeByte: A semantic clone detection tool for intermediate languages , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[32]  William F. Opdyke,et al.  Refactoring object-oriented frameworks , 1992 .

[33]  Michael W. Godfrey,et al.  Using origin analysis to detect merging and splitting of source code entities , 2005, IEEE Transactions on Software Engineering.

[34]  Francesca Arcelli Fontana,et al.  Software Clone Detection and Refactoring , 2013 .

[35]  Oege de Moor,et al.  Specifying and implementing refactorings , 2010, OOPSLA.

[36]  Andy Podgurski,et al.  Retrieving reusable software by sampling behavior , 1993, TSEM.

[37]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[38]  Michael D. Ernst,et al.  Refactoring sequential Java code for concurrency via concurrent libraries , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[39]  Yishai A. Feldman,et al.  Detecting Refactored Clones , 2013, ECOOP.

[40]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[41]  Oscar Nierstrasz,et al.  Finding refactorings via change metrics , 2000, OOPSLA '00.

[42]  Katsuro Inoue,et al.  Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder , 2007, 29th International Conference on Software Engineering (ICSE'07).

[43]  Brenda S. Baker,et al.  Finding Clones with Dup: Analysis of an Experiment , 2007, IEEE Transactions on Software Engineering.

[44]  Robert M. Fuhrer,et al.  Refactoring in the Eclipse JDT : Past , Present , and Future , 2007 .

[45]  William G. Griswold Program restructuring as an aid to software maintenance , 1992 .