Educated guesses and equality judgements : using search engines and pairwise match for external plagiarism detection Notebook for PAN at CLEF 2012

This paper describes the approaches taken to the two subtasks of Candidate Document Retrieval and Detailed Comparison, in the Plagiarism Detection track at PAN 12. For the first of these, we describe how we used a combination of frequency and a variation of a contrastive corpus measure to select keywords with which to make queries to the ChatNoir search system; for the second, we provide an overview of how we re-used software that had previously featured in PAN 11. We comment specifically on how effective both approaches were, and what steps we might take to improve if the competition remains substantially similar next time.