论文信息 - Source code similarity detection by using data mining methods

Source code similarity detection by using data mining methods

Programming courses at university and high school level, and competitions in informatics (programming), often require fast assessment of received solutions of the programming tasks. This problem is usually solved by use of automated systems that check the produced output for some test cases for every solution. In our paper we present a novel approach of representation of the programming codes as vectors, and use of these vectors in data mining analysis that could produce better assessment of the solutions. We present the results of cluster analysis that go up to 88% of correctly clustered items on average.

Mile Jovanov | Ana Madevska Bogdanova | Emil Stankov

[1] Mile Jovanov,et al. A new design of a system for contest management and grading in informatics competitions , 2010 .

[2] Rob Kolstad. The International Olympiad on Informatics , 2009, login Usenix Mag..

[3] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4] Martin MAREŠ. Perspectives on Grading Systems , 2007 .

[5] Petri Ihantola,et al. Review of recent systems for automatic assessment of programming assignments , 2010, Koli Calling.

[6] Chanchal K. Roy,et al. A Survey on Software Clone Detection Research , 2007 .

[7] Marian Petre,et al. Seeing the Whole Picture: Evaluating Automated Assessment Systems , 2007 .

[8] Stefano Maggiolo,et al. Introducing CMS: A Contest Management System , 2012 .