Measuring Whitespace Pattern Sequences as an Indication of Plagiarism

There are several methods and technologies for comparing the statements, comments, strings, identifiers, and other visible elements of source code in order to efficiently identify similarity. In a prior paper we found that comparing the whitespace patterns was not precise enough to identify copying by itself. However, several possible methods for improving the precision of a whitespace pattern comparison were presented, the most promising of which was an examination of the sequences of lines with matching whitespace patterns. This paper demonstrates a method of evaluating the sequences of matching whitespace patterns and a detailed study of the method’s reliability.

[1]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[2]  Nikolaus Baer,et al.  Measuring Whitespace Patterns as an Indication of Plagiarism , 2010 .

[3]  Hugo T. Jankowitz Detecting Plagiarism in Student Pascal Programs , 1988, Comput. J..

[4]  J. Dillinger FINGERPRINTS , 1938 .

[5]  Robert Zeidman Software Source Code Correlation , 2006, 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering,Software Architecture and Reuse (ICIS-COMSAR'06).

[6]  Georgina Cosma,et al.  An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis , 2012, IEEE Transactions on Computers.

[7]  Zhoujun Li,et al.  BUAA_AntiPlagiarism: A System To Detect Plagiarism for C Source Code , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[8]  Nikolaus Baer,et al.  Measuring Software Evolution with Changing Lines of Code , 2009, CATA.

[9]  Upul Bandara,et al.  A Machine Learning Based Tool for Source Code Plagiarism Detection , 2011 .

[10]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[11]  Ettore Merlo,et al.  Detection of Plagiarism in University Projects Using Metrics-based Spectral Similarity , 2006, Duplication, Redundancy, and Similarity in Software.

[12]  Robert Zeidman Multidimensional Correlation of Software Source Code , 2008, 2008 Third International Workshop on Systematic Approaches to Digital Forensic Engineering.

[13]  Mike Joy,et al.  Source-code Plagiarism: a UK Academic Perspective , 2006 .

[14]  Claude W. Anderson,et al.  Plagiarism Detection in Computer Code , 2005 .

[15]  Baojiang Cui,et al.  Type Redefinition Plagiarism Detection of Token-Based Comparison , 2010, 2010 International Conference on Multimedia Information Networking and Security.

[16]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[17]  Charlie Daly,et al.  A Technique for Detecting Plagiarism in Computer Code , 2005, Comput. J..

[18]  James O. Hamblen,et al.  Computer algorithms for plagiarism detection , 1989 .

[19]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..