The Encoplot Similarity Measure for Automatic Detection of Plagiarism - Notebook for PAN at CLEF 2011

This paper describes the evolution of our method Encoplot for automatic plagiarism detection and the results of the participation to the PAN’11 competition. The main novelties are the introduction of a new similarity measure and of a new ranking method, which cooperate to rank much better the source– suspicious document pairs when selecting the candidates for the detailed analysis phase. We have obtained excellent results in the competition, ranking 1 on the manually paraphrased cases, 2 overall in the external plagiarism detection task, and getting the best recall on the non-translated corpus.