An Extended Stable Marriage Problem Algorithm for Clone Detection

Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting clone d software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarit y measures and thus enhancingthe precision of the detection. This is achieved by extending a well -known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed.A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.

[1]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[2]  Filippo Lanubile,et al.  Function Clone Detection in Web Applications: A Semiautomated Approach , 2004, J. Web Eng..

[3]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[4]  Michel Dagenais,et al.  Extending software quality assessment techniques to Java systems , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[5]  Susan Horwitz,et al.  Effective, automatic procedure extraction , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[6]  Rainer Koschke,et al.  Empirical evaluation of clone detection using syntax suffix trees , 2008, Empirical Software Engineering.

[7]  Yun Yang,et al.  Towards a clone detection benchmark suite and results archive , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[8]  Neil Davey,et al.  The development of a software clone detector , 1995 .

[9]  Brenda S. Baker,et al.  A Program for Identifying Duplicated Code , 1992 .

[10]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[11]  Rüdiger Lincke,et al.  Comparing software metrics tools , 2008, ISSTA '08.

[12]  Nicholas Tran,et al.  Sim: a utility for detecting similarity in computer programs , 1999, SIGCSE '99.

[13]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[14]  Stan Jarzabek,et al.  A Data Mining Approach for Detecting Higher-Level Clones in Software , 2009, IEEE Transactions on Software Engineering.

[15]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[16]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[17]  William F. Smyth,et al.  Efficient token based clone detection with flexible tokenization , 2007, ESEC-FSE companion '07.

[18]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[19]  Lucas Layman,et al.  Are decomposition slices clones? , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[20]  Robert W. Irving,et al.  The Stable marriage problem - structure and algorithms , 1989, Foundations of computing series.

[21]  Christopher W. Fraser,et al.  Clone detection via structural abstraction , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[22]  Jarallah AlGhamdi,et al.  OOMeter: a software quality assurance tool , 2005, Ninth European Conference on Software Maintenance and Reengineering.

[23]  Chanchal Kumar Roy,et al.  Detection and analysis of near-miss software clones , 2009, 2009 IEEE International Conference on Software Maintenance.

[24]  R. Beran National resident matching program. , 1999, Gastroenterology.

[25]  Renato De Mori,et al.  Pattern matching for clone and concept detection , 2004, Automated Software Engineering.

[26]  Susan Horwitz,et al.  Automated duplicated code detection and procedure extraction , 2003 .

[27]  Radu Marinescu,et al.  Archeology of code duplication: recovering duplication chains from small duplication fragments , 2005, Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05).

[28]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[29]  Jürgen Wolff von Gudenberg,et al.  Clone detection in source code by frequent itemset techniques , 2004, Source Code Analysis and Manipulation, Fourth IEEE International Workshop on.

[30]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[31]  Shinji Kusumoto,et al.  On detection of gapped code clones using gap locations , 2002, Ninth Asia-Pacific Software Engineering Conference, 2002..

[32]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[33]  Eric McDermid,et al.  A unified approach to finding good stable matchings in the hospitals/residents setting , 2008, Theor. Comput. Sci..

[34]  Watts S. Humphrey,et al.  Introduction to the Personal Software Process , 1996 .

[35]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[36]  Kostas Kontogiannis,et al.  Evaluation experiments on the detection of programming patterns using software metrics , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[37]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[38]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[39]  Richard C. Holt,et al.  Visualizing Clone Cohesion and Coupling , 2006, 2006 13th Asia Pacific Software Engineering Conference (APSEC'06).

[40]  Martin Dyer,et al.  The Stable Marriage Problem: Structure and Algorithms , 1991 .

[41]  Eric McDermid,et al.  Matching with sizes (or scheduling with processing set restrictions) , 2014, Discret. Appl. Math..

[42]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[43]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[44]  Dolores R. Wallace,et al.  Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric , 1996 .

[45]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[46]  Stéphane Ducasse,et al.  Insights into system-wide code duplication , 2004, 11th Working Conference on Reverse Engineering.

[47]  J. Howard Johnson,et al.  Visualizing textual redundancy in legacy source , 1994, CASCON.

[48]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[49]  Seunghak Lee,et al.  SDD: high performance code clone detection system for large scale source code , 2005, OOPSLA '05.

[50]  Susan Horwitz,et al.  Semantics-preserving procedure extraction , 2000, POPL '00.

[51]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[52]  Jeffrey G. Gray,et al.  Phoenix-based clone detection using suffix trees , 2006, ACM-SE 44.

[53]  William F. Smyth,et al.  Efficient token based clone detection with flexible tokenization , 2007, FSE 2007.

[54]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[55]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[56]  Rajiv Gupta,et al.  Code Compaction of Matching Single-Entry Multiple-Exit Regions , 2003, SAS.

[57]  Michael W. Godfrey,et al.  Aiding comprehension of cloning through categorization , 2004, Proceedings. 7th International Workshop on Principles of Software Evolution, 2004..

[58]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[59]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[60]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[61]  Oscar Nierstrasz,et al.  On the effectiveness of clone detection by string matching , 2006, J. Softw. Maintenance Res. Pract..