Software Clone Detection Using Clustering Approach

Code clones are highly similar or identical code segments. Identification of clones helps improve software quality through managed evolution, refactoring, complexity reduction, etc. In this study, we investigate Type 1 and Type 2 function clones using a data mining technique. First, we create a dataset by collecting metrics for all functions in a software system. Second, we apply DBSCAN clustering algorithm on the dataset so that each cluster can be analysed to detect Type 1 and Typei¾ź2 function clones. We evaluate our approach by analyzing an open source software Bitmessage. We calculate the precision value to show the effectiveness of our approach in detecting function clones. We show that our approach for functional clone detection is effective with high precision value and number of function clones detected.

[1]  Michael W. Godfrey,et al.  Unified use case statecharts: case studies , 2007, Requirements Engineering.

[2]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[3]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[4]  Jianling Sun,et al.  A metric space based software clone detection approach , 2010 .

[5]  Wei Lee Woon,et al.  NLP-KAOS for Systems Goal Elicitation: Smart Metering System Case Study , 2014, IEEE Transactions on Software Engineering.

[6]  William F. Smyth,et al.  Efficient token based clone detection with flexible tokenization , 2007, FSE 2007.

[7]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[8]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[9]  Salwa K. Abd-El-Hafiz,et al.  A Metrics-Based Data Mining Approach for Software Clone Detection , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.

[10]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[11]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[12]  Chanchal Kumar Roy,et al.  Understanding the evolution of Type-3 clones: An exploratory study , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[13]  Hoan Anh Nguyen,et al.  Clone Management for Evolving Software , 2012, IEEE Transactions on Software Engineering.

[14]  Chanchal Kumar Roy,et al.  Scenario-Based Comparison of Clone Detection Techniques , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[15]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[16]  Jianling Sun,et al.  An iterative, metric space based software clone detection approach , 2010, The 2nd International Conference on Software Engineering and Data Mining.

[17]  Davor Svetinovic,et al.  Strategic requirements engineering for complex sustainable systems , 2013, Syst. Eng..

[18]  Stan Jarzabek,et al.  A Data Mining Approach for Detecting Higher-Level Clones in Software , 2009, IEEE Transactions on Software Engineering.

[19]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.