Measuring code similarity using word mover's distance for programming course

Teachers tend to ask students submit their assignments online not only in online courses but also face to face courses. The phenomena of plagiarism is becoming more and more serious due to the ease with which resources can be found on the Internet also, especially in a computer programming course. This paper aims to develop a robust automated detection technology of code plagiarism towards programming course. After analyzing and summarized state of art of code plagiarism technology, a more robust detection technology is developed by combining word2vec with Word mover's distance (WMD) similarity metric in the paper. We consider the different plagiarism methods when students commit their program source code. Then we collect more than 20 thousands code submissions in our introductory C++ programming course for non-major students and check whether it is a plagiarized code manually. In the process, we examine how our proposed method compare with two other main algorithms and their suitability for different plagiarism characteristics. The results obtained on the dataset indicate that our approach is well suited for detect different types of code plagiarism. We conclude that incorporating WMD similarity metric is crucial for improved effective and adaptability.