Automatic programming error class identification with code plagiarism-based clustering

Online platforms to learn programming are very popular nowadays. These platforms must automatically assess codes submitted by the learners and must provide good quality feedbacks in order to support their learning. Classical techniques to produce useful feedbacks include using unit testing frameworks to perform systematic functional tests of the submitted codes or using code quality assessment tools. This paper explores how to automatically identify error classes by clustering a set of submitted codes, using code plagiarism detection tools to measure the similarity between the codes. The proposed approach and analysis framework are presented in the paper, along with a first experiment using the Code Hunt dataset.

[1]  Pedro Rangel Henriques,et al.  Plagiarism Detection: A Tool Survey and Comparison , 2014, SLATE.

[2]  R. Nigel Horspool,et al.  Code Hunt: Experience with Coding Contests at Scale , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[3]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[4]  Fionn Murtagh,et al.  Multidimensional clustering algorithms , 1985 .

[5]  Michael Luck,et al.  Plagiarism in programming assignments , 1999 .

[6]  Sébastien Combéfis,et al.  Pythia reloaded: an intelligent unit testing-based code grader for education , 2015, CHESE@ISSTA.

[7]  Philip J. Guo,et al.  OverCode: visualizing variation in student solutions to programming problems at scale , 2014, ACM Trans. Comput. Hum. Interact..

[8]  Jurriaan Hage,et al.  A comparison of plagiarism detection tools , 2010 .

[9]  Sébastien Combéfis,et al.  Teaching programming and algorithm design with Pythia : a Web-Based learning platform , 2012 .

[10]  Xiaohong Su,et al.  Semantic similarity-based grading of student programs , 2007, Inf. Softw. Technol..

[11]  Ying Liu,et al.  Assessment of programming language learning based on peer code review model: Implementation and experience report , 2012, Comput. Educ..

[12]  Christoph Meinel,et al.  Towards practical programming exercises and automated assessment in Massive Open Online Courses , 2015, 2015 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE).

[13]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[14]  Sébastien Combéfis,et al.  Programming Trainings and Informatics Teaching Through Online Contests , 2014 .

[15]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[16]  Jéan H. Greyling,et al.  Marking student programs using graph similarity , 2010, Comput. Educ..

[17]  Sumit Gulwani,et al.  Automated feedback generation for introductory programming assignments , 2012, PLDI.

[18]  Elena L. Glassman,et al.  Feature engineering for clustering student solutions , 2014, L@S.

[19]  Vreda Pieterse,et al.  Automated Assessment of Programming Assignments , 2013, CSERC.

[20]  Matija Novak,et al.  Review of source-code plagiarism detection in academia , 2016, 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[21]  Peter Van Roy,et al.  Recasting a traditional course into a MOOC by means of a SPOC , 2014 .

[22]  Christopher Douce,et al.  Automatic test-based assessment of programming: A review , 2005, JERC.

[23]  Petri Ihantola,et al.  Review of recent systems for automatic assessment of programming assignments , 2010, Koli Calling.

[24]  Mike Joy,et al.  Effective peer assessment for learning computer programming , 2004, ITiCSE '04.