论文信息 - Towards the Detection of Cross-Language Source Code Reuse

Towards the Detection of Cross-Language Source Code Reuse

Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.

Alberto Barrón-Cedeño | Paolo Rosso | Lidia Moreno | Enrique Flores

[1] Alberto Barrón-Cedeño,et al. A statistical approach to crosslingual natural language tasks , 2008, LA-NMR.

[2] Benno Stein,et al. Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.

[3] S. K. Robinson,et al. An empirical approach for detecting program similarity and plagiarism within a university programming environment , 1987 .

[4] Hugo T. Jankowitz. Detecting Plagiarism in Student Pascal Programs , 1988, Comput. J..

[5] Efstathios Stamatatos,et al. Intrinsic Plagiarism Detection Using Character n-gram Profiles , 2009 .

[6] Seyed M. M. Tahaghoghi,et al. Plagiarism detection across programming languages , 2006, ACSC.

[7] Francisco Rosales,et al. Detection of Plagiarism in Programming Assignments , 2008, IEEE Transactions on Education.