Vector abstraction and concretization for scalable detection of refactorings

Automated techniques have been proposed to either identify refactoring opportunities (i.e., code fragments that can be but have not yet been restructured in a program), or reconstruct historical refactorings (i.e., code restructuring operations that have happened between different versions of a program). In this paper, we propose a new technique that can detect both refactoring opportunities and historical refactorings in large code bases. The key of our technique is the design of vector abstraction and concretization operations that can encode code changes induced by certain refactorings as characteristic vectors. Thus, the problem of identifying refactorings can be reduced to the problem of identifying matching vectors, which can be solved efficiently. We have implemented our technique for Java. The prototype is applied to 200 bundle projects from the Eclipse ecosystem containing 4.5 million lines of code, and reports in total more than 32K instances of 17 types of refactoring opportunities, taking 25 minutes on average for each type. The prototype is also applied to 14 versions of 3 smaller programs (JMeter, Ant, XML-Security), and detects (1) more than 2.8K refactoring opportunities within individual versions with a precision of about 87%, and (2) more than 190 historical refactorings across consecutive versions of the programs with a precision of about 92%.

[1]  Butler W. Lampson,et al.  A Machine Learning Framework for Programming by Example , 2013, ICML.

[2]  Serge Demeyer,et al.  An Initial Investigation into Change-Based Reconstruction of Floss-Refactorings , 2013, 2013 IEEE International Conference on Software Maintenance.

[3]  Shinpei Hayashi,et al.  Detecting Occurrences of Refactoring with Heuristic Search , 2008, 2008 15th Asia-Pacific Software Engineering Conference.

[4]  Zhendong Niu,et al.  Identification of generalization refactoring opportunities , 2013, Automated Software Engineering.

[5]  Kevin Crowston,et al.  Social dynamics of free and open source team communications , 2006, OSS.

[6]  Hong Mei,et al.  Inferring Specifications of Object Oriented APIs from API Source Code , 2008, 2008 15th Asia-Pacific Software Engineering Conference.

[7]  Yishai A. Feldman,et al.  Detecting Refactored Clones , 2013, ECOOP.

[8]  William G. Griswold Program restructuring as an aid to software maintenance , 1992 .

[9]  Stas Negara,et al.  A Comparative Study of Manual and Automated Refactorings , 2013, ECOOP.

[10]  Jose Meseguer,et al.  Formal Specification and Verification of Java Refactorings , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[11]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[12]  Ralph E. Johnson,et al.  Automated Detection of Refactorings in Evolving Components , 2006, ECOOP.

[13]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[14]  Oscar Nierstrasz,et al.  Finding refactorings via change metrics , 2000, OOPSLA '00.

[15]  Heejung Kim,et al.  MeCC: memory comparison-based clone detector , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[16]  Miryung Kim,et al.  Template-based reconstruction of complex refactorings , 2010, 2010 IEEE International Conference on Software Maintenance.

[17]  Miryung Kim,et al.  A graph-based approach to API usage adaptation , 2010, OOPSLA.

[18]  Serge Demeyer,et al.  Evaluating clone detection techniques from a refactoring perspective , 2004 .

[19]  Zhendong Niu,et al.  Schedule of Bad Smell Detection and Resolution: A New Way to Save Effort , 2012, IEEE Transactions on Software Engineering.

[20]  Miryung Kim,et al.  An empirical investigation into the impact of refactoring on regression testing , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[21]  Shinji Kusumoto,et al.  ARIES: refactoring support tool for code clone , 2005, WoSQ@ICSE.

[22]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[23]  Manuvir Das,et al.  Perracotta: mining temporal API rules from imperfect traces , 2006, ICSE.

[24]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[25]  Eleni Stroulia,et al.  JDeodorant: identification and application of extract class refactorings , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[26]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[27]  Siau-Cheng Khoo,et al.  Scalable detection of missed cross-function refactorings , 2014, ISSTA 2014.

[28]  Tom Mens,et al.  A survey of software refactoring , 2004, IEEE Transactions on Software Engineering.

[29]  Serge Demeyer,et al.  Evaluating clone detection techniques from a refactoring perspective , 2004, Proceedings. 19th International Conference on Automated Software Engineering, 2004..

[30]  Miryung Kim,et al.  Ref-Finder: a refactoring reconstruction tool based on logic query templates , 2010, FSE '10.

[31]  Hoan Anh Nguyen,et al.  Clone Management for Evolving Software , 2012, IEEE Transactions on Software Engineering.

[32]  William F. Opdyke,et al.  Refactoring object-oriented frameworks , 1992 .

[33]  Michael D. Ernst,et al.  Refactoring sequential Java code for concurrency via concurrent libraries , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[34]  Stephan Diehl,et al.  Identifying Refactorings from Source-Code Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[35]  Siau-Cheng Khoo,et al.  Graph-based detection of library API imitations , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[36]  Tom Mens,et al.  Identifying refactoring opportunities using logic meta programming , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[37]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[38]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[39]  Alexander Chatzigeorgiou,et al.  Identification of Extract Method Refactoring Opportunities , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[40]  Francesca Arcelli Fontana,et al.  Software Clone Detection and Refactoring , 2013 .

[41]  Oege de Moor,et al.  Specifying and implementing refactorings , 2010, OOPSLA.

[42]  Dongmei Zhang,et al.  XIAO: tuning code clones at hands of engineers in practice , 2012, ACSAC '12.

[43]  Danny Dig,et al.  LambdaFicator: From imperative to functional programming through automated refactoring , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[44]  Hung Viet Nguyen,et al.  Graph-based pattern-oriented, context-sensitive source code completion , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[45]  Tao Xie,et al.  Inferring Resource Specifications from Natural Language API Documentation , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[46]  Tao Xie,et al.  Automated detection of api refactorings in libraries , 2007, ASE '07.

[47]  Ralf Lämmel,et al.  Towards generic refactoring , 2002, RULE '02.

[48]  E. Murphy-Hill,et al.  Refactoring Tools: Fitness for Purpose , 2006, IEEE Software.

[49]  Oege de Moor,et al.  JunGL: a scripting language for refactoring , 2006, ICSE.

[50]  Miryung Kim,et al.  Lase: Locating and applying systematic edits by learning from examples , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[51]  Shinji Kusumoto,et al.  A metric-based approach to identifying refactoring opportunities for merging code clones in a Java software system , 2008, J. Softw. Maintenance Res. Pract..

[52]  Robert Tairas,et al.  Clone detection and refactoring , 2006, OOPSLA '06.

[53]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[54]  Emerson R. Murphy-Hill Scalable, expressive, and context-sensitive code smell display , 2008, OOPSLA Companion.

[55]  Henry Lieberman,et al.  Watch what I do: programming by demonstration , 1993 .

[56]  Michael W. Godfrey,et al.  Using origin analysis to detect merging and splitting of source code entities , 2005, IEEE Transactions on Software Engineering.

[57]  Yang Liu,et al.  Case study on software refactoring tactics , 2014, IET Softw..

[58]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[59]  Sumit Gulwani,et al.  Spreadsheet table transformations from examples , 2011, PLDI '11.

[60]  Daniel J. Quinlan,et al.  Detecting code clones in binary executables , 2009, ISSTA.

[61]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[62]  Emerson R. Murphy-Hill,et al.  Manual refactoring changes with automated refactoring validation , 2014, ICSE.