Automated Detection of Code Duplication Clusters

By eliminating the duplicates, you ensure that the code says everything once and only once, which is the essence of good design (Once And Only Once Rule). Acknowledgments I am very proud to have Dr. Radu Marinescu as my advisor. I am profoundly indebted to him for his trust in me and for seeing my skills even when I was sceptical about it. Due to his permanent encouragement I was able to understand that research is not something unreachable and too much above us, but rather something that can be done with hard work and enthusiasm. Being around him is a permanent source of inspiration to me. The experiences I've been through in the LOOSE Research Group made me feel fortunate. The great ideas that Dani Rat¸iu had, expressed in his rather reserved way due to modesty, helped me many times over this project (by the way, pair programming is really fun), as well as Pepi's permanent availability to discuss algorithms (and to share the scalability progress!). I learned a lot with you guys. I would also like to thank Cristina and everyone else who tested the tool, starting with its early versions. Without the feedback I got from her, DuDe would have never become that reliable. And to my lovely wife, Simy, for always being there for me. All systems change during their life-cycles. This must be borne in mind when developing systems expected to last longer than the first version. All software systems are subject to continuous evolution and maintenance activities in order to eliminate defects and extend their functionalities. We need to deal with code duplication in order to prevent some problems that will appear when trying to adapt to the changes that are imminent in a real software system (one that stands its first release). Definition 1.1 (Code clone) A code clone is a code portion in source files that is identical or similar to another [KKI02]. Code duplications (or code clones) appear for a variety of reasons: • Code reutilization by copying existing solution • Failure to identify or use abstract data types • Performance enhancement • Accidents 1 2 CHAPTER 1. INTRODUCTION Code reutilization misunderstood is when developers systematically copy previously existing code which solved a problem similar to the one they are currently trying to solve. Programmers intent on implementing new functionality, find some working code that performs a computation nearly identical to …

[1]  Christopher Alexander,et al.  The Timeless Way of Building , 1979 .

[2]  Samuel L. Grier,et al.  A tool that detects plagiarism in Pascal programs , 1981, SIGCSE '81.

[3]  Hugo T. Jankowitz Detecting Plagiarism in Student Pascal Programs , 1988, Comput. J..

[4]  簡聰富,et al.  物件導向軟體之架構(Object-Oriented Software Construction)探討 , 1989 .

[5]  Kent L. Beck,et al.  Extreme programming explained - embrace change , 1990 .

[6]  Brenda S. Baker,et al.  A Program for Identifying Duplicated Code , 1992 .

[7]  Kenneth Ward Church,et al.  Dotplot : a program for exploring self-similarity in millions of lines of text and code , 1993 .

[8]  Ivar Jacobson,et al.  Object-oriented software engineering - a use case driven approach , 1993, TOOLS.

[9]  Ralph E. Johnson,et al.  Refactoring and Aggregation , 1993, ISOTAS.

[10]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[11]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[12]  Jonathan Helfman,et al.  Dotplot Patterns: A Literal Look at Pattern Languages , 1996, Theory Pract. Object Syst..

[13]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[14]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[15]  L. D. Moura,et al.  Clone detection using abstract syntax trees , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[16]  Stéphane Ducasse,et al.  Visual Detection of Duplicated Code , 1998, ECOOP Workshops.

[17]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[18]  Serge Demeyer,et al.  The FAMOOS Object-Oriented Reengineering Handbook , 1999 .

[19]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[20]  Robert C. Martin Agile Software Development, Principles, Patterns, and Practices , 2002 .

[21]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[22]  Rainer Koschke,et al.  Vergleich von Techniken zur Erkennung duplizierten Quellcodes , 2002 .

[23]  Giuliano Antoniol,et al.  Complexity and Feasibility Issues in Object Oriented Clone Detection , 2003 .

[24]  Michael W. Godfrey,et al.  A Taxonomy of Clones in Source Code: The Re–Engineers Most Wanted List , 2003 .

[25]  Michael W. Godfrey,et al.  Toward a Taxonomy of Clones in Source Code: A Case Study , 2003 .

[26]  Andrew Walenstein,et al.  Clone Detector Evaluation Can Be Improved: Ideas from Information Retrieval , 2003 .

[27]  Radu Marinescu,et al.  Measurement and Quality in Object-Oriented Design , 2005, ICSM.