An Integrated Framework Based on Latent Variational Autoencoder for Providing Early Warning of At-Risk Students

The rapid development of learning technologies has enabled online learning paradigm to gain great popularity in both high education and K-12, which makes the prediction of student performance become one of the most popular research topics in education. However, the traditional prediction algorithms are originally designed for balanced dataset, while the educational dataset typically belongs to highly imbalanced dataset, which makes it more difficult to accurately identify the at-risk students. In order to solve this dilemma, this study proposes an integrated framework (LVAEPre) based on latent variational autoencoder (LVAE) with deep neural network (DNN) to alleviate the imbalanced distribution of educational dataset and further to provide early warning of at-risk students. Specifically, with the characteristics of educational data in mind, LVAE mainly aims to learn latent distribution of at-risk students and to generate at-risk samples for the purpose of obtaining a balanced dataset. DNN is to perform final performance prediction. Extensive experiments based on the collected K-12 dataset show that LVAEPre can effectively handle the imbalanced education dataset and provide much better and more stable prediction results than baseline methods in terms of accuracy and $F_{1.5} $ score. The comparison of t-SNE visualization results further confirms the advantage of LVAE in dealing with imbalanced issue in educational dataset. Finally, through the identification of the significant predictors of LVAEPre in the experimental dataset, some suggestions for designing pedagogical interventions are put forward.

[1]  Ke Zhang,et al.  Revealing Online Learning Behaviors and Activity Patterns and Making Predictions with Data Mining Techniques in Online Teaching , 2008 .

[2]  Jui-Long Hung,et al.  Integrating Data Mining in Program Evaluation of K-12 Online Education , 2012, J. Educ. Technol. Soc..

[3]  Maged Abdullah Esmail,et al.  Separability of Histogram Based Features for Optical Performance Monitoring: An Investigation Using t-SNE Technique , 2019, IEEE Photonics Journal.

[4]  Joseph E. Beck,et al.  Going Deeper with Deep Knowledge Tracing , 2016, EDM.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Xindong Wu,et al.  Online feature selection for high-dimensional class-imbalanced data , 2017, Knowl. Based Syst..

[7]  Habib Fardoun,et al.  Early dropout prediction using data mining: a case study with high school students , 2016, Expert Syst. J. Knowl. Eng..

[8]  Visara Ekahitanond,et al.  Promoting university students’ critical thinking skills through peer feedback activity in an online discussion forum , 2013 .

[9]  Neil T. Heffernan,et al.  The Impact of Incorporating Student Confidence Items into an Intelligent Tutor: A Randomized Controlled Trial , 2015, EDM.

[10]  D. Schunk Self-Regulated Learning: The Educational Legacy of Paul R. Pintrich , 2005 .

[11]  Brett E. Shelton,et al.  A systematic meta-Review and analysis of learning analytics research , 2019, Behav. Inf. Technol..

[12]  Krystle Phirangee,et al.  Students' Perceptions of Learner-Learner Interactions that Weaken a Sense of Community in an Online Learning Environment , 2016 .

[13]  Mar Pérez-Sanagustín,et al.  Self-regulated learning strategies predict learner behavior and goal attainment in Massive Open Online Courses , 2017, Comput. Educ..

[14]  Amjad Abu Saa,et al.  Educational Data Mining & Students’ Performance Prediction , 2016 .

[15]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[16]  B. Pradhan,et al.  Application of GIS based data driven evidential belief function model to predict groundwater potential zonation , 2014 .

[17]  Witold Pedrycz,et al.  Dual autoencoders features for imbalance classification problem , 2016, Pattern Recognit..

[18]  Mustafa Neamah Jebur,et al.  Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS , 2013 .

[19]  Amrit Tiwana,et al.  Knowledge integration in virtual teams: The potential role of KMS , 2002, J. Assoc. Inf. Sci. Technol..

[20]  Saman Hina,et al.  Predicting Student Academic Performance using Data Mining Methods , 2017 .

[21]  Hai Liu,et al.  A content-based recommendation algorithm for learning resources , 2017, Multimedia Systems.

[22]  Julia E. Seaman,et al.  Grade Increase: Tracking Distance Education in the United States. , 2018 .

[23]  Steven Lonn,et al.  What and when: the role of course type and timing in students' academic performance , 2016, LAK.

[24]  Young Hwan Kim,et al.  Statistical Leakage Analysis Using Gaussian Mixture Model , 2018, IEEE Access.

[25]  Qing Zhou,et al.  Predicting high-risk students using Internet access logs , 2017, Knowledge and Information Systems.

[26]  Chris Piech,et al.  Learning to Represent Student Knowledge on Programming Exercises Using Deep Learning , 2017, EDM.

[27]  Anat Cohen,et al.  Analysis of student activity in web-supported courses as a tool for predicting dropout , 2017, Educational Technology Research and Development.

[28]  Kui Xie,et al.  What do the numbers say? The influence of motivation and peer feedback on students' behaviour in online discussions , 2013, Br. J. Educ. Technol..

[29]  Amirsina Torfi,et al.  3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition , 2017, IEEE Access.

[30]  Philip Barker,et al.  Using e-learning dialogues in higher education , 2004 .

[31]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[32]  Baldoino Fonseca dos Santos Neto,et al.  Evaluating the effectiveness of educational data mining techniques for early prediction of students' academic failure in introductory programming courses , 2017, Comput. Hum. Behav..

[33]  Hasni Hassan,et al.  A FRAMEWORK FOR STUDENTS’ ACADEMIC PERFORMANCE ANALYSIS USING NAÏVE BAYES CLASSIFIER , 2015 .

[34]  Michael C. Mozer,et al.  How Deep is Knowledge Tracing? , 2016, EDM.

[35]  María José del Jesús,et al.  A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets , 2008, Fuzzy Sets Syst..

[36]  Jim Hewitt,et al.  Scan Rate: A New Metric for the Analysis of Reading Behaviors in Asynchronous Computer Conferencing Environments , 2007 .

[37]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[38]  Yaohang Li,et al.  Identifying At-Risk Students for Early Interventions—A Time-Series Clustering Approach , 2017, IEEE Transactions on Emerging Topics in Computing.

[39]  Marium-E-Jannat,et al.  Product recommendation: A deep learning factorization method using separate learners , 2017, 2017 20th International Conference of Computer and Information Technology (ICCIT).

[40]  Po-Yao Chao,et al.  Improving early prediction of academic failure using sentiment analysis on self-evaluated comments , 2018, J. Comput. Assist. Learn..

[41]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[42]  ZOHRE GHAZIVAKILI,et al.  The role of critical thinking skills and learning styles of university students in their academic performance , 2014, Journal of advances in medical education & professionalism.

[43]  Maren Scheffel,et al.  Widget, Widget on the Wall, Am I Performing Well at All? , 2017, IEEE Transactions on Learning Technologies.

[44]  Brett E. Shelton,et al.  Improving Predictive Modeling for At-Risk Student Identification: A Multistage Approach , 2019, IEEE Transactions on Learning Technologies.

[45]  Zhi-Hua Zhou,et al.  Ensemble Methods for Class Imbalance Learning , 2013 .

[46]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[47]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[48]  Zhiguo Jiang,et al.  Classification for Dermoscopy Images Using Convolutional Neural Networks Based on Region Average Pooling , 2018, IEEE Access.

[49]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[50]  Carlos Márquez-Vera,et al.  Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data , 2013, Applied Intelligence.

[51]  Farshid Marbouti,et al.  Models for early prediction of at-risk students in a course using standards-based grading , 2016, Comput. Educ..

[52]  Ji Won You,et al.  Identifying significant indicators using LMS data to predict course achievement in online learning , 2016, Internet High. Educ..

[53]  Shiliang Sun,et al.  Variational Inference for Infinite Mixtures of Gaussian Processes With Applications to Traffic Flow Prediction , 2011, IEEE Transactions on Intelligent Transportation Systems.

[54]  Abdullah Alsheddy,et al.  On the application of data mining algorithms for predicting student performance: a case study , 2017 .

[55]  Stephanie D. Teasley,et al.  A time series interaction analysis method for building predictive models of learners using log data , 2015, LAK.

[56]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[57]  Dongho Kim,et al.  Toward evidence-based learning analytics: Using proxy variables to improve asynchronous online discussion environments , 2016, Internet High. Educ..

[58]  Ping Li,et al.  Visualization Analytics for Second Language Vocabulary Learning in Virtual Worlds , 2017, J. Educ. Technol. Soc..

[59]  H. Nawang,et al.  CLASSIFICATION MODEL AND ANALYSIS ON STUDENTS’ PERFORMANCE , 2018 .

[60]  Dragan Gasevic,et al.  Generating actionable predictive models of academic performance , 2016, LAK.

[61]  Steven Lonn,et al.  Improving Early Warning Systems with Categorized Course Resource Usage , 2016 .

[62]  Elizabeth A. Davis,et al.  Scaffolding students' knowledge integration: prompts for reflection in KIE , 2000 .

[63]  Tsunenori Mine,et al.  Predicting Student Grade based on Free-style Comments using Word2Vec and ANN by Considering Prediction Results Obtained in Consecutive Lessons , 2015, EDM.