A Systematic Review of Deep Learning Approaches to Educational Data Mining

Educational Data Mining (EDM) is a research field that focuses on the application of data mining, machine learning, and statistical methods to detect patterns in large collections of educational data. Different machine learning techniques have been applied in this field over the years, but it has been recently that Deep Learning has gained increasing attention in the educational domain. Deep Learning is a machine learning method based on neural network architectures with multiple layers of processing units, which has been successfully applied to a broad set of problems in the areas of image recognition and natural language processing. This paper surveys the research carried out in Deep Learning techniques applied to EDM, from its origins to the present day. The main goals of this study are to identify the EDM tasks that have benefited from Deep Learning and those that are pending to be explored, to describe the main datasets used, to provide an overview of the key concepts, main architectures, and configurations of Deep Learning and its applications to EDM, and to discuss current state-of-the-art and future directions on this area of research.

[1]  Roger Nkambou,et al.  Convolutional Neural Network for Automatic Detection of Sociomoral Reasoning Level , 2017, EDM.

[2]  Neil T. Heffernan,et al.  The ASSISTments Ecosystem: Building a Platform that Brings Scientists and Teachers Together for Minimally Invasive Research on Human Learning and Teaching , 2014, International Journal of Artificial Intelligence in Education.

[3]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Mykola Pechenizkiy,et al.  Handbook of Educational Data Mining , 2010 .

[6]  Sebastián Ventura,et al.  Data mining in education , 2013, WIREs Data Mining Knowl. Discov..

[7]  Chen Lin,et al.  A Comparisons of BKT, RNN and LSTM for Learning Gain Prediction , 2017, AIED.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Geoffrey E. Hinton Deep belief networks , 2009, Scholarpedia.

[10]  Arjun Sharma,et al.  LIVELINET: A Multimodal Deep Recurrent Neural Network to Predict Liveliness in Educational Videos , 2016, EDM.

[11]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[12]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.

[13]  Osmar R. Zaïane,et al.  Educational data mining applications and tasks: A survey of the last 10 years , 2017, Education and Information Technologies.

[14]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[15]  Chunyan Miao,et al.  Deep Model for Dropout Prediction in MOOCs , 2017, ICCSE'17.

[16]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[17]  Cheng-Yuan Liou,et al.  Autoencoder for words , 2014, Neurocomputing.

[18]  Neil T. Heffernan,et al.  Using Big Data to Sharpen Design-Based Inference in A/B Tests , 2018, EDM.

[19]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[20]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[21]  Hosam Al-Samarraie,et al.  Educational data mining and learning analytics for 21st century higher education: A review and synthesis , 2019, Telematics Informatics.

[22]  Varun Ganapathi,et al.  GritNet 2: Real-Time Student Performance Prediction with Domain Adaptation , 2018, ArXiv.

[23]  Yuan Zhang,et al.  Deep Learning + Student Modeling + Clustering: a Recipe for Effective Automatic Short Answer Grading , 2016, EDM.

[24]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[26]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[27]  Chris Piech,et al.  Deep Knowledge Tracing On Programming Exercises , 2017, L@S.

[28]  Kristy Elizabeth Boyer,et al.  Predicting Dialogue Acts for Intelligent Virtual Agents with Multimodal Student Interaction Data , 2016, EDM.

[29]  Dit-Yan Yeung,et al.  Temporal Models for Predicting Student Dropout in Massive Open Online Courses , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[30]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[31]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[32]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[33]  Joseph E. Beck,et al.  Going Deeper with Deep Knowledge Tracing , 2016, EDM.

[34]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[35]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[36]  Min Chi,et al.  Deep Learning vs. Bayesian Knowledge Tracing: Student Models for Interventions , 2018 .

[37]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[38]  Sung Wook Baik,et al.  Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features , 2018, IEEE Access.

[39]  Neil T. Heffernan,et al.  Addressing the assessment challenge with an online system that tutors as it assesses , 2009, User Modeling and User-Adapted Interaction.

[40]  Jacob Whitehill,et al.  Who are they looking at? Automatic Eye Gaze Following for Classroom Observation Video Analysis , 2018, EDM.

[41]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[42]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[43]  James C. Lester,et al.  Improving Stealth Assessment in Game-based Learning with LSTM-based Analytics , 2018, EDM.

[44]  Sebastián Ventura,et al.  Educational data mining: A survey from 1995 to 2005 , 2007, Expert Syst. Appl..

[45]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[46]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[47]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[48]  Sweety Agrawal,et al.  Few hundred parameters outperform few hundred thousand? , 2017, EDM.

[49]  Cristina L. Abad,et al.  An Undergraduate Project combining Computer Science and the Arts: An Experience Report of a Multidisciplinary Capstone Design , 2018, CSERC '18.

[50]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Kai Wang,et al.  Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction , 2018, ICMI.

[52]  Varun Ganapathi,et al.  GritNet: Student Performance Prediction with Deep Learning , 2018, EDM.

[53]  Chris Wong,et al.  Sequence Based Course Recommender for Personalized Curriculum Planning , 2018, AIED.

[54]  Alejandro Peña-Ayala Review: Educational data mining: A survey and a data mining-based analysis of recent works , 2014 .

[55]  Christopher Brooks,et al.  Social work in the classroom? A tool to evaluate topical relevance in student writing , 2017, EDM.

[56]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[57]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[58]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[59]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[60]  Alpana Dubey,et al.  LeCoRe: A Framework for Modeling Learner's preference , 2018, EDM.

[61]  Chris Piech,et al.  Learning to Represent Student Knowledge on Programming Exercises Using Deep Learning , 2017, EDM.

[62]  Thomas Jackson,et al.  Neural Computing - An Introduction , 1990 .

[63]  Dit-Yan Yeung,et al.  Addressing two problems in deep knowledge tracing via prediction-consistent regularization , 2018, L@S.

[64]  Mirza Mohtashim Alam,et al.  A Reduced feature based neural network approach to classify the category of students , 2018, ICIAI '18.

[65]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[66]  Alejandro Peña Ayala,et al.  Educational data mining: A survey and a data mining-based analysis of recent works , 2014, Expert Syst. Appl..

[67]  Xiaolu Xiong,et al.  Submission to the NIPS 2016 Workshop on Machine Learning for Education Estimating student proficiency : Deep learning is not the panacea , 2016 .

[68]  Junyu Dong,et al.  An Overview on Data Representation Learning: From Traditional Feature Learning to Recent Deep Learning , 2016, ArXiv.

[69]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[70]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[71]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[72]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[73]  Jacob Whitehill,et al.  Delving Deeper into MOOC Student Dropout Prediction , 2017, ArXiv.

[74]  Neil T. Heffernan,et al.  Incorporating Rich Features into Deep Knowledge Tracing , 2017, L@S.

[75]  Chaitanya Ekanadham,et al.  Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation , 2016, EDM.

[76]  Zachary A. Pardos,et al.  Deep Neural Networks and How They Apply to Sequential Education Data , 2016, L@S.

[77]  Amal Zouaq,et al.  A Comparison of Features for the Automatic Labeling of Student answers to Open-ended Questions , 2018, EDM.

[78]  Wanli Xing,et al.  Dropout Prediction in MOOCs: Using Deep Learning for Personalized Intervention , 2019 .

[79]  Michael C. Mozer,et al.  Does Deep Knowledge Tracing Model Interactions Among Skills? , 2018, EDM.

[80]  Kenneth R. Koedinger,et al.  A Data Repository for the EDM Community: The PSLC DataShop , 2010 .

[81]  Wei Yu,et al.  A Survey of Deep Learning: Platforms, Applications and Emerging Research Trends , 2018, IEEE Access.

[82]  Neil T. Heffernan,et al.  A Memory-Augmented Neural Model for Automated Grading , 2017, L@S.

[83]  Hiroaki Ogata,et al.  A neural network approach for students' performance prediction , 2017, LAK.

[84]  Shiv Kumar Saini,et al.  Modeling Hint-Taking Behavior and Knowledge State of Students with Multi-Task Learning , 2018, EDM.

[85]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[86]  Li Yang,et al.  Predicting Students Performance in Educational Data Mining , 2015, 2015 International Symposium on Educational Technology (ISET).

[87]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[88]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[89]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[90]  Laura Alonso Alemany,et al.  Co-embeddings for Student Modeling in Virtual Learning Environments , 2018, UMAP.

[91]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[92]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[93]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[94]  Bendangnuksung,et al.  Students ' Performance Prediction Using Deep Neural Network , 2018 .