CHECK: a document plagiarism detection system

Digital documents are vulnerable to being copied. Most existing copy detection prototypes employ an exhaustive sentence-based comparison method in comparing a potential plagiarized document against a repository of legal or original documents to identify plagiarism activities. This approach is not scalable due to the potentially large number of original documents and the large number of sentences in each document . Furthermore, the security level of existing mechanisms is quite weak; a plagiarized document could simply by-pass the detection mechanisms by performing a minor modification on each sentence. In this paper, we propose a copy detection mechanism that will el iminate unnecessary comparisons. This is based on the observation that comparisons between two documents addressing different subjects are not necessary. We describe the design and implementation of our exper imental proto type called CHECK. The results of some exploratory experiments will be illust rated and the security level of our mechanism will be discussed.