A Program Plagiarism Detection Model Based on Information Distance and Clustering

Plagiarism in students programming assignment submissions causes considerable difficulties for course designers. Efficient detection of plagiarism in programming assignments of students is important to the educational procedure. This paper proposes a metric, based on information distance, to measure similarity between two programs. Furthermore, clustering analysis, based on shared near neighbors, is applied in order to provide more beneficial and detailed information about the program plagiarism. Experimental results demonstrate that our software has clear advantages over other plagiarism detection systems and it is quite beneficial to teachers to get rid of time-consuming and toilsome tasks. Key words: Program plagiarism, Detection, Information distance, Clustering

[1]  Ann-Marie Lancaster,et al.  A plagiarism detection system , 1981, SIGCSE '81.

[2]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[3]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[4]  James O. Hamblen,et al.  Computer algorithms for plagiarism detection , 1989 .

[5]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[6]  Xin Chen,et al.  Shared information and program plagiarism detection , 2004, IEEE Transactions on Information Theory.

[7]  Michael J. Wise,et al.  Software for detecting suspected plagiarism: comparing structure and attribute-counting systems , 1996, ACSE '96.

[8]  Ari Juels,et al.  Squealing Euros: Privacy Protection in RFID-Enabled Banknotes , 2003, Financial Cryptography.

[9]  Athena Vakali,et al.  PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets , 2005, Comput. J..

[10]  Nicholas Tran,et al.  Sim: a utility for detecting similarity in computer programs , 1999, SIGCSE '99.

[11]  Michael J. Wise,et al.  Running Karp-Rabin Matching and Greedy String Tiling , 2003 .

[12]  S. K. Robinson,et al.  An empirical approach for detecting program similarity and plagiarism within a university programming environment , 1987 .

[13]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[14]  K. J. Ottenstein An algorithmic approach to the detection and prevention of plagiarism , 1976, SGCS.

[15]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[16]  En-Hui Yang,et al.  On the Performance of Data Compression Algorithms Based Upon String Matching , 1998, IEEE Trans. Inf. Theory.

[17]  Samuel L. Grier,et al.  A tool that detects plagiarism in Pascal programs , 1981, SIGCSE '81.

[18]  Michael Luck,et al.  Plagiarism in programming assignments , 1999 .

[19]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[20]  David Argles,et al.  Plagiarism in e-Learning Systems: Identifying and Solving the Problem for Practical Assignments , 2006 .

[21]  Paul Müller,et al.  Hash-based enhancement of location privacy for radio-frequency identification devices using varying identifiers , 2004, IEEE Annual Conference on Pervasive Computing and Communications Workshops, 2004. Proceedings of the Second.