论文信息 - Differential Weight Based Hybrid Approach to Detect Software Plagiarism

Differential Weight Based Hybrid Approach to Detect Software Plagiarism

In this paper we propose different representations of a source code, which attempt to highlight different aspects of a code; particularly: (i) lexical, (ii) structural, and (iii) stylistics. For the lexical view, we used levenshtein distance without considering all reserved words of the programming language. For the structural view, we proposed a similarity metric that takes into account the function’s signatures and variable declaration within a source code. The third view consists of several stylistic features, such as the number of white spaces, lines of code, upper case letters, etc. At the end, we combine these different representations in several ways. Obtained results indicate that proposed representations provide some information that allows to detect particular cases of source code re-use.

Sandip Modha | Dhruv Dave | Nrupesh Shah

[1] S. Narayanan,et al. Source code plagiarism detection and performance analysis using fingerprint based distance measure method , 2012, 2012 7th International Conference on Computer Science & Education (ICCSE).

[2] Paolo Rosso,et al. On the Detection of SOurce COde Re-use , 2014, FIRE.

[3] S. K. Robinson,et al. An empirical approach for detecting program similarity and plagiarism within a university programming environment , 1987 .

[4] Nikolaus Baer,et al. Measuring Whitespace Pattern Sequences as an Indication of Plagiarism , 2012 .

[5] Philip S. Yu,et al. GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[6] Mike Joy,et al. Evaluating the Performance of LSA for Source-code Plagiarism Detection , 2012, Informatica.