论文信息 - Multi Queries Methods of the Chinese-English Bilingual Plagiarism Detection

Multi Queries Methods of the Chinese-English Bilingual Plagiarism Detection

Cross-language plagiarism detection identifies and extracts plagiarized text in a multilingual environment. In recent years, there has been a significant amount of work done involving English and European text. However, somewhat less attention has been paid to Asia languages. We compared a number of different strategies for Chinese-English bilingual plagiarism detection. We present methods for candidate document retrieval and compare four methods: (i) document keywords based, (ii) intrinsic plagiarism based, (iii) headers based, and (iv) machine translation queries. The results of our evaluation indicated that keywords based queries, the simplest and most efficient approach, gives acceptable results for newspaper articles. We also compared different percentage of keywords based query, and the results indicated that putting 50% keywords into queries can obtain the satisfied candidate documents set.

Phil Vines | Hong Ye Chen | Hong Chen | Phil Vines

[1] German Rigau,et al. Book Reviews: EuroWordNet: A Multilingual Database with Lexical Semantic Networks , 1999, CL.

[2] Simon Suchomel,et al. Three Way Search Engine Queries with Multi-feature Document Comparison for Plagiarism Detection , 2012, CLEF.

[3] Alberto Barrón-Cedeño,et al. On Cross-lingual Plagiarism Analysis using a Statistical Model , 2008, PAN.

[4] Haoliang Qi,et al. The Chinese-English Bilingual Sentence Alignment Based on Length , 2011, 2011 International Conference on Asian Language Processing.

[5] Paul Clough,et al. Old and new challenges in automatic plagiarism detection , 2003 .

[6] Benno Stein,et al. A Wikipedia-Based Multilingual Retrieval Model , 2008, ECIR.

[7] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8] Benno Stein,et al. Intrinsic Plagiarism Detection , 2006, ECIR.

[9] Benno Stein,et al. Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.