论文信息 - Plagiarism and Collusion Detection using the Smith-Waterman Algorithm

Plagiarism and Collusion Detection using the Smith-Waterman Algorithm

We investigate the use of variants of the Smith-Waterman algorithm to locate similarities in texts and in program source code, with a view to their application in the detection of plagiarism and collusion. The Smith-Waterman algorithm is a classical tool in the identification and quantification of local similarities in biological sequences, but we demonstrate that somewhat different issues arise in this different context, and that these factors can be exploited to yield significant speed-up in practice. We include empirical evidence to indicate the practicality of the approach and to illustrate the efficiency gains.

Robert W. Irving

[1] Ömer Egecioglu,et al. A new approach to sequence comparison: normalized sequence alignment , 2001, RECOMB.

[2] Piotr Berman,et al. Alignments without low-scoring regions , 1998, RECOMB '98.

[3] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[4] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5] R. Ravi,et al. Non-Overlapping Local Alignments (Weighted Independent Sets of Axis Parallel Rectangles) , 1995, WADS.

[6] W. Miller,et al. A time-efficient, linear-space local similarity algorithm , 1991 .

[7] Webb Miller,et al. A space-efficient algorithm for local similarities , 1990, Comput. Appl. Biosci..

[8] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[9] M. Waterman,et al. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[10] D. Lipman,et al. Rapid and sensitive protein similarity searches. , 1985, Science.

[11] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.