论文信息 - Sentence-Based Plagiarism Detection for Japanese Document Based on Common Nouns and Part-of-Speech Structure

Sentence-Based Plagiarism Detection for Japanese Document Based on Common Nouns and Part-of-Speech Structure

Plagiarism by the copy and paste of documents written by other authors has recently become a large problem as electronic documents have increased. In higher educational institutions, it is also of great concern in student reports. In this paper, we have proposed a novel approach to automatically detect plagiarism, especially for student experimental reports in Japanese and focusing on the common nouns and the structure of parts of speech for each sentence. We have also performed experiments to evaluate our approach with actual Japanese experimental reports written by our students with the measures such as precision, recall and F-value. As the experimental results, our proposed approach has succeeded to detect plagiarized pairs of sentences within high accuracy. In addition, we also discuss the parts where our proposed approach miss-detected and couldn’t detect.

Takeru Yokoi

[1] Naomie Salim,et al. An improved plagiarism detection scheme based on semantic role labeling , 2012, Appl. Soft Comput..

[2] Yuji Matsumoto,et al. Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[3] Hermann A. Maurer,et al. Plagiarism - A Survey , 2006, J. Univers. Comput. Sci..

[4] Seong-Bae Park,et al. An application for plagiarized source code detection based on a parse tree kernel , 2013, Eng. Appl. Artif. Intell..

[5] Maria Soledad Pera,et al. Nowhere to Hide: Finding Plagiarized Documents Based on Sentence Similarity , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[6] Wei Lee Woon,et al. Text plagiarism detection method based on path patterns , 2008, Int. J. Bus. Intell. Data Min..

[7] Hector Garcia-Molina,et al. Copy detection mechanisms for digital documents , 1995, SIGMOD '95.

[8] Mike Joy,et al. Sentence-based natural language plagiarism detection , 2004, JERC.