The performance of some machine learning approaches and a rich context model in student answer prediction

Web-based learning systems with adaptive capabilities to personalize content are becoming nowadays a trend in order to offer interactive learning materials to cope with a wide diversity of students attending online education. Learners’ interaction and study practice (quizzing, reading, exams) can be analyzed in order to get some insights into the student’s learning style, study schedule, knowledge, and performance. Quizzing might be used to help to create individualized/personalized spaced repetition algorithm in order to improve long-term retention of knowledge and provide efficient learning in online learning platforms. Current spaced repetition algorithms have pre-defined repetition rules and parameters that might not be a good fit for students’ different learning styles in online platforms. This study uses different machine learning models and a rich context model to analyze quizzing and reading records from e-learning platform called Hypocampus in order to get some insights into the relevant features to predict learning outcome (quiz answers). By knowing the answer correctness, a learning system might be able to recommend personalized repetitive schedule for questions with maximizing long-term memory retention. Study results show that question difficulty level and incorrectly answered previous questions are useful features to predict the correctness of student’s answer. The gradient-boosted tree and XGBoost models are best in predicting the correctness of the student’s answer before answering a quiz. Additionally, some non-linear relationship was found between the reading learning material behavior in the platform and quiz performance that brings added value to the accuracy for all used models.

[1]  Shiv Kumar Saini,et al.  Modeling Hint-Taking Behavior and Knowledge State of Students with Multi-Task Learning , 2018, EDM.

[2]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[3]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[4]  Christopher M. Bishop,et al.  Bayesian Neural Networks , 1997, J. Braz. Comput. Soc..

[5]  Burr Settles,et al.  A Trainable Spaced Repetition Model for Language Learning , 2016, ACL.

[6]  Mitchell J. Nathan,et al.  Improving Students’ Learning With Effective Learning Techniques: Promising Directions From Cognitive and Educational Psychology , 2012 .

[7]  J. Phelan,et al.  Effectiveness of an Adaptive Quizzing System as an Institutional-Wide Strategy to Improve Student Learning and Retention , 2016, Nurse educator.

[8]  Fred Paas,et al.  Evaluating retrieval practice in a MOOC: how writing and reading summaries of videos affects student learning , 2018, LAK.

[9]  Joseph E. Beck,et al.  Engagement tracing: using response times to model student disengagement , 2005, AIED.

[10]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[11]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[12]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[13]  A. C. Butler,et al.  The critical role of retrieval practice in long-term retention , 2011, Trends in Cognitive Sciences.

[14]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[15]  Radek Pelánek,et al.  Impact of Adaptive Educational System Behaviour on Student Motivation , 2015, AIED.

[16]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[17]  Shanna Smith Jaggars,et al.  Improving Developmental Education Assessment and Placement: Lessons from Community Colleges across the Country. CCRC Working Paper No. 51. , 2012 .

[18]  Ismar Silveira,et al.  Deep Learning applied to Learning Analytics and Educational Data Mining: A Systematic Literature Review , 2017 .

[19]  Kai Ming Ting,et al.  Confusion Matrix , 2010, Encyclopedia of Machine Learning and Data Mining.

[20]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[21]  Martin Strobel Aspects of Transparency in Machine Learning , 2019, AAMAS.

[22]  Bernhard Schölkopf,et al.  Enhancing human learning via spaced repetition optimization , 2019, Proceedings of the National Academy of Sciences.

[23]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[24]  Neil T. Heffernan,et al.  A prediction model that uses the sequence of attempts and hints to better predict knowledge: "Better to attempt the problem first, rather than ask for a hint" , 2013, EDM.

[25]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[26]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[27]  Li Yang,et al.  Predicting Students Performance in Educational Data Mining , 2015, 2015 International Symposium on Educational Technology (ISET).

[28]  Fabrice Popineau,et al.  Modelling Student Learning and Forgetting for Optimally Scheduling Skill Review , 2020, ERCIM News.

[29]  Anna Brown,et al.  Handbook of Item Response Theory Modeling : Applications to Typical Performance Assessment , 2014 .

[30]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[31]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[32]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[33]  Marcelo Milrad,et al.  Flexible and Contextualized Cloud Applications for Mobile Learning Scenarios , 2016 .

[34]  Marcelo Milrad,et al.  Using Data Mining Techniques to Assess Students’ Answer Predictions , 2019 .

[35]  Nicolás Morales,et al.  Mining theory-based patterns from Big data: Identifying self-regulated learning strategies in Massive Open Online Courses , 2018, Comput. Hum. Behav..

[36]  David Page,et al.  Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals , 2013, ECML/PKDD.

[37]  Cheng G. Weng,et al.  A New Evaluation Measure for Imbalanced Datasets , 2008, AusDM.

[38]  Hahn-Ming Lee,et al.  Personalized e-learning system using Item Response Theory , 2005, Comput. Educ..

[39]  Radek Pelánek,et al.  Bayesian knowledge tracing, logistic models, and beyond: an overview of learner modeling techniques , 2017, User Modeling and User-Adapted Interaction.

[40]  Alisa Sotsenko A Rich Context Model : Design and Implementation , 2017 .

[41]  Nick Pentreath,et al.  Machine Learning with Spark , 2015 .

[42]  Geert-Jan Houben,et al.  The half-life of MOOC knowledge: a randomized trial evaluating knowledge retention and retrieval practice in MOOCs , 2018, LAK.

[43]  B. Jonsson,et al.  Do Individual Differences in Cognition and Personality Predict Retrieval Practice Activities on MOOCs? , 2020, Frontiers in Psychology.

[44]  Jordi Torres,et al.  A Methodology for Spark Parameter Tuning , 2017, Big Data Res..

[45]  B. Ross,et al.  Adaptive quizzes to increase motivation, engagement and learning outcomes in a first year accounting unit , 2018, International Journal of Educational Technology in Higher Education.

[46]  Sandra Katz,et al.  The "Grey Area": A Computational Approach to Model the Zone of Proximal Development , 2017, EC-TEL.

[47]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[48]  Eduardo Guzmán,et al.  Student Knowledge Diagnosis Using Item Response Theory and Constraint-Based Modeling , 2009, AIED.

[49]  Wahidah Husain,et al.  A Review on Predicting Student's Performance Using Data Mining Techniques , 2015 .

[50]  John Dunlosky,et al.  Improving Students’ Learning With Effective Learning Techniques: Promising Directions From Cognitive and Educational Psychology , 2012 .

[51]  Jeffrey D. Karpicke,et al.  The Critical Importance of Retrieval for Learning , 2008, Science.

[52]  Marcelo Milrad,et al.  Using a Rich Context Model for Real-Time Big Data Analytics in Twitter , 2016, 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW).

[53]  Zachary A. Pardos,et al.  KT-IDEM: introducing item difficulty to the knowledge tracing model , 2011, UMAP'11.

[54]  Peter Brusilovsky,et al.  Integrating Knowledge Tracing and Item Response Theory: A Tale of Two Frameworks , 2014, UMAP Workshops.

[55]  Jeffrey D. Karpicke,et al.  Test-Enhanced Learning , 2006, Psychological science.

[56]  Geert-Jan Houben,et al.  Retrieval Practice and Study Planning in MOOCs: Exploring Classroom-Based Self-regulated Learning Strategies at Scale , 2016, EC-TEL.

[57]  Michael B. Miller Linear Regression Analysis , 2013 .

[58]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[59]  Fredric C. Gey,et al.  The relationship between recall and precision , 1994 .

[60]  John Dunlosky,et al.  Toward a general model of self-regulated study: An analysis of selection of items for study and self-paced study time. , 1999 .