Code similarity detection has been studied for several decades, which are prevailing categorized into attributecounting
and structure-metric. Due to the one fold validity of attribute-counting for full replication, mature
systems usually use the GST string matching algorithm to detect code structure. However, the accuracy of
GST is vulnerable to interference in code similarity detection. This paper presents a code similarity detection
method combining string matching and sub-graph isomorphism. The similarity is calculated with the GST
algorithm. Then according to the similarity, the system determines whether further processing with the sub-graph
iIsomorphism algorithm is required. Extensive experimental results illustrate that our method significantly
enhances the efficiency of string matching as well as the accuracy of code similarity detecting.
[1]
Maurice H. Halstead,et al.
Elements of software science
,
1977
.
[2]
Ann-Marie Lancaster,et al.
A plagiarism detection system
,
1981,
SIGCSE '81.
[3]
G. Whale.
Indentification of Program Similarity in Large Populations
,
1990,
Comput. J..
[4]
Nicholas Tran,et al.
Sim: a utility for detecting similarity in computer programs
,
1999,
SIGCSE '99.
[5]
Michael J. Wise,et al.
YAP3: improved detection of similarities in computer program and other texts
,
1996,
SIGCSE '96.