Dropout Prediction in MOOCs: A Comparison Between Process and Sequence Mining

Recently, Massive Open Online Courses (MOOCs) have experienced rapid development. However, one of the major issues of online education is the high dropout rates of participants. Many studies have attempted to explore this issue, using quantitative and qualitative methods for student attrition analysis. Nevertheless, there is a lack of studies which (1) predict the actual moment of dropout, providing opportunities to enhance MOOCs’ student retention by offering timely interventions; and (2) compare the performance of such predicting algorithms. In this paper, we aim to predict student drop out in MOOCs using process and sequence mining techniques, and provide a comparative analysis of these techniques. We perform a case study based on the data from KU Leuven online course “Trends in e-Psychology”, available on the edX platform. The results reveal, that while process mining is better capable to perform descriptive analysis, sequence mining techniques provide better features for predictive purposes.

[1]  Jing Luan,et al.  Data Mining and Its Applications in Higher Education , 2002 .

[2]  John S. Kinnebrew,et al.  A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns , 2013, EDM 2013.

[3]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Alessandro Sperduti,et al.  Time and activity sequence prediction of business process instances , 2016, Computing.

[5]  Wil M. P. van der Aalst,et al.  Process Mining , 2016, Springer Berlin Heidelberg.

[6]  Mohammed J. Zaki,et al.  Scalable Feature Mining for Sequential Data , 2000, IEEE Intell. Syst..

[7]  Jörg Becker,et al.  Comprehensible Predictive Models for Business Processes , 2016, MIS Q..

[8]  Girish Balakrishnan,et al.  Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models , 2013 .

[9]  Bart Baesens,et al.  Active Trace Clustering for Improved Process Discovery , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Boris Cule,et al.  Pattern Based Sequence Classification , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  Mathias Weske,et al.  Prediction of Remaining Service Execution Time Using Stochastic Petri Nets with Arbitrary Firing Delays , 2013, ICSOC.

[12]  Marc Boullé,et al.  A Parameter-Free Approach for Mining Robust Sequential Classification Rules , 2015, 2015 IEEE International Conference on Data Mining.

[13]  Fabrizio Maria Maggi,et al.  Clustering-Based Predictive Process Monitoring , 2015, IEEE Transactions on Services Computing.

[14]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[15]  van der Wmp Wil Aalst,et al.  Exploring students’ learning behaviour in MOOCs using process mining techniques , 2015 .

[16]  Fabrizio Maria Maggi,et al.  Looking into the Future. Using Timed Automata to Provide a Priori Advice about Timed Declarative Process Models , 2012, OTM Conferences.

[17]  Alessandro Sperduti,et al.  Data-aware remaining time prediction of business process instances , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[18]  Patrick C. Shih,et al.  Understanding Student Motivation, Behaviors and Perceptions in MOOCs , 2015, CSCW.

[19]  Wil M. P. van der Aalst,et al.  A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs , 2016, Inf. Syst..

[20]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[21]  Wil M. P. van der Aalst,et al.  Time prediction based on process mining , 2011, Inf. Syst..

[22]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[23]  Panagiotis Adamopoulos,et al.  What makes a great MOOC? An interdisciplinary analysis of student retention in online courses , 2013, ICIS.

[24]  Tom Adawi,et al.  “Time is the bottleneck”: a qualitative study exploring why learners drop out of MOOCs , 2016, Journal of Computing in Higher Education.

[25]  Seppe K. L. M. vanden Broucke,et al.  Explaining clusterings of process instances , 2016, Data Mining and Knowledge Discovery.

[26]  Maria Bannert,et al.  e-Research and learning theory: What do sequence and process mining methods contribute? , 2014, Br. J. Educ. Technol..

[27]  Sherif A. Halawa,et al.  Dropout Prediction in MOOCs using Learner Activity Features , 2014 .