Learning Program Embeddings to Propagate Feedback on Student Code

Providing feedback, both assessing final work and giving hints to stuck students, is difficult for open-ended assignments in massive online classes which can range from thousands to millions of students. We introduce a neural network method to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and propose an algorithm for feedback at scale using these linear maps as features. We apply our algorithm to assessments from the Code.org Hour of Code and Stanford University's CS1 course, where we propagate human comments on student assignments to orders of magnitude more submissions.

[1]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[2]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[3]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[4]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[5]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[6]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[7]  Maks Ovsjanikov,et al.  Functional maps , 2012, ACM Trans. Graph..

[8]  K. Fukumizu,et al.  Kernel Embeddings of Conditional Distributions: A Unified Kernel Framework for Nonparametric Inference in Graphical Models , 2013, IEEE Signal Processing Magazine.

[9]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[10]  Sumit Basu,et al.  Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading , 2013, TACL.

[11]  Lauretta O. Osho,et al.  Axiomatic Basis for Computer Programming , 2013 .

[12]  Barbara Hammer,et al.  Domain-Independent Proximity Measures in Intelligent Tutoring Systems , 2013, EDM.

[13]  Leonidas J. Guibas,et al.  Analysis and Visualization of Maps Between Shapes , 2013, Comput. Graph. Forum.

[14]  Leonidas J. Guibas,et al.  Syntactic and Functional Variability of a Million Code Submissions in a Machine Learning MOOC , 2013, AIED Workshops.

[15]  Samuel R. Bowman Can recursive neural tensor networks learn logical reasoning? , 2014, ICLR.

[16]  Sumit Basu,et al.  Divide and correct: using clusters to grade short answers at scale , 2014, L@S.

[17]  R. Fergus,et al.  Learning to Discover Efficient Mathematical Identities , 2014, NIPS.

[18]  Leonidas J. Guibas,et al.  Codewebs: scalable homework search for massive open online programming courses , 2014, WWW.

[19]  Daniel D. Garcia,et al.  ACES: Automatic Evaluation of Coding Style , 2014 .

[20]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[21]  Leonidas J. Guibas,et al.  Autonomously Generating Hints by Inferring Problem Solving Policies , 2015, L@S.

[22]  Richard G. Baraniuk,et al.  Mathematical Language Processing: Automatic Grading and Feedback for Open Response Mathematical Questions , 2015, L@S.