Educated guesses and equality judgements : using search engines and pairwise match for external plagiarism detection Notebook for PAN at CLEF 2012
暂无分享,去创建一个
This paper describes the approaches taken to the two subtasks of Candidate Document Retrieval and Detailed Comparison, in the Plagiarism Detection track at PAN 12. For the first of these, we describe how we used a combination of frequency and a variation of a contrastive corpus measure to select keywords with which to make queries to the ChatNoir search system; for the second, we provide an overview of how we re-used software that had previously featured in PAN 11. We comment specifically on how effective both approaches were, and what steps we might take to improve if the competition remains substantially similar next time.
[1] Lee Gillam,et al. Terminology and the construction of ontology , 2005 .
[2] Jimmy J. Lin,et al. Pairwise Document Similarity in Large Collections with MapReduce , 2008, ACL.
[3] Neil Cooke,et al. A high-performance plagiarism detection system , 2011 .