Automatic Clustering of Different Solutions to Programming Assignments in Computing Education

A computer programming assignment may have various solutions, and extracting them is of great significance for both teaching and learning. However, it could be challenging for instructors and students to identify the differences between those solutions if they are on a large scale. Since code similarity is of vital importance in identifying the differences between solutions, we review previous researches on code similarity and design a neural network-based algorithm for detecting the similarity between codes in a pair as well as identifying the features that have a big impact on code similarity. Then we develop a clustering algorithm based on code similarity that can automatically generate clusters for all correct solutions to a given programming assignment. Our experiment demonstrates that the clustering algorithm can successfully obtain distinctive clusters in our dataset. Our analysis of typical solutions can provide inspirations for instructors and students.

[1]  Raymond Lister,et al.  Relationships between reading, tracing and writing skills in introductory programming , 2008, ICER '08.

[2]  Serafeim Tsironis,et al.  Accurate Spectral Clustering for Community Detection in MapReduce , 2013 .

[3]  Jure Zupan,et al.  Introduction to Artificial Neural Network (ANN) Methods: What They Are and How to Use Them*. , 1994 .

[4]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[5]  Tao Wang,et al.  TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing , 2014, ArXiv.

[6]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[7]  Sumit Gulwani,et al.  Automated clustering and program repair for introductory programming assignments , 2016, PLDI.

[8]  Haoran Yu,et al.  WASTK: A Weighted Abstract Syntax Tree Kernel Method for Source Code Plagiarism Detection , 2017, Sci. Program..

[9]  Ewan D. Tempero,et al.  On the differences between correct student solutions , 2013, ITiCSE '13.

[10]  Sumit Gulwani,et al.  Semi-supervised verified feedback generation , 2016, SIGSOFT FSE.

[11]  Elena L. Glassman,et al.  Feature engineering for clustering student solutions , 2014, L@S.

[12]  Angela Carbone,et al.  Going SOLO to assess novice programmers , 2008, ITiCSE.

[13]  Harald Søndergaard,et al.  Learning from and with peers: the different roles of student peer reviewing , 2009, ITiCSE.

[14]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[15]  David Clark,et al.  A comparison of code similarity analysers , 2018, Empirical Software Engineering.

[16]  Robert McCartney,et al.  Naturally occurring data as research instrument: analyzing examination responses to study the novice programmer , 2010, SGCS.

[17]  George,et al.  Computer Algorithms for Plagiarism Detection , 1989 .

[18]  Philip J. Guo,et al.  OverCode: visualizing variation in student solutions to programming problems at scale , 2014, ACM Trans. Comput. Hum. Interact..