Knowledge Graphs as Context Models: Improving the Detection of Cross-Language Plagiarism with Paraphrasing

Cross-language plagiarism detection attempts to identify and extract automatically plagiarism among documents in different languages. Plagiarized fragments can be translated verbatim copies or may alter their structure to hide the copying, which is known as paraphrasing and is more difficult to detect. In order to improve the paraphrasing detection, we use a knowledge graph-based approach to obtain and compare context models of document fragments in different languages. Experimental results in German-English and Spanish-English cross-language plagiarism detection indicate that our knowledge graph-based approach offers a better performance compared to other state-of-the-art models.

[1]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[2]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[3]  Alberto Barrón-Cedeño,et al.  On Cross-lingual Plagiarism Analysis using a Statistical Model , 2008, PAN.

[4]  Parth Gupta,et al.  Cross-Language Plagiarism Detection Using a Multilingual Semantic Network , 2013, ECIR.

[5]  Parth Gupta,et al.  A New Approach to Cross-Language Plagiarism Detection , 2013 .

[6]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[7]  Benno Stein,et al.  Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.

[8]  Alberto Barrón-Cedeño,et al.  Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection , 2013, CL.

[9]  Matthias Hagen,et al.  Overview of the 1st international competition on plagiarism detection , 2009 .

[10]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[11]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[12]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[13]  Benno Stein,et al.  Strategies for retrieving plagiarized documents , 2007, SIGIR.

[14]  Ricardo Baeza-Yates,et al.  Flexible comparison of conceptual graphs , 2001 .

[15]  Piek Vossen,et al.  EUROWORDNET: A MULTILINGUAL DATABASE OF AUTONOMOUS AND LANGUAGE-SPECIFIC WORDNETS CONNECTED VIA AN INTER-LINGUALINDEX , 2004, International Journal of Lexicography.

[16]  Paolo Rosso,et al.  Cross-language Plagiarism Detection Using BabelNet’s Statistical Dictionary , 2012 .

[17]  Bruno Pouliquen,et al.  Automatic linking of similar texts across languages , 2003, RANLP.

[18]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[19]  Alberto Barrón-Cedeño,et al.  Cross-Language High Similarity Search Using a Conceptual Thesaurus , 2012, CLEF.

[20]  Benno Stein,et al.  An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[21]  Alexander F. Gelbukh,et al.  Flexible Comparison of Conceptual GraphsWork done under partial support of CONACyT, CGEPI-IPN, and SNI, Mexico , 2001, DEXA.

[22]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[23]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .