Probabilistic Use Cases: Discovering Behavioral Patterns for Predicting Certification

Advances in open-online education have led to a dramatic increase in the size, diversity, and traceability of learner populations, offering tremendous opportunities to study detailed learning behavior of users around the world. This paper adapts the topic modeling approach of Latent Dirichlet Allocation (LDA) to uncover behavioral structure from student logs in a MITx Massive Open Online Course, 8.02x: Electricity and Magnetism. LDA is typically found in the field of natural language processing, where it identifies the latent topic structure within a collection of documents. However, this framework can be adapted for analysis of user-behavioral patterns by considering user interactions with courseware as a ``bag of interactions'' equivalent to the ``bag of words'' model found in topic modeling. By employing this representation, LDA forms probabilistic use cases that clusters students based on their behavior. Through the probability distributions associated with each use case, this approach provides an interpretable representation of user access patterns, while reducing the dimensionality of the data and improving accuracy. Using only the first week of logs, we can predict whether or not a student will earn a certificate with 0.81 ± 0.01 cross-validation accuracy. Thus, the method presented in this paper is a powerful tool in understanding user behavior and predicting outcomes.

[1]  Carolyn Penstein Rosé,et al.  “ Turn on , Tune in , Drop out ” : Anticipating student dropouts in Massive Open Online Courses , 2013 .

[2]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Jean Andrews A+ In Depth , 2003 .

[5]  C. Elkan,et al.  Topic Models , 2008 .

[6]  Justin Reich,et al.  Characterizing Video Use in the Catalogue of MITx MOOCs , 2014 .

[7]  Chris Piech,et al.  Deconstructing disengagement: analyzing learner subpopulations in massive open online courses , 2013, LAK '13.

[8]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[9]  Lise Getoor,et al.  Modeling Learner Engagement in MOOCs using Probabilistic Soft Logic , 2013 .

[10]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[11]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[12]  Vasile Palade,et al.  Class Imbalance Learning Methods for Support Vector Machines , 2013 .

[13]  Justin Reich,et al.  HarvardX and MITx: The First Year of Open Online Courses, Fall 2012-Summer 2013 , 2014 .

[14]  Isaac L. Chuang,et al.  Participation And performance In 8.02x Electricity And Magnetism: The First Physics MOOC From MITx , 2013, 1310.3173.

[15]  Justin Reich,et al.  8.02x Electricity and Magnetism MITx on edX Course Report - 2013 Spring , 2014 .

[16]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[17]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[18]  Krzysztof Z. Gajos,et al.  Understanding in-video dropouts and interaction peaks inonline lecture videos , 2014, L@S.

[19]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[20]  Yanchun Zhang,et al.  Modelling User Behaviour for Web Recommendation Using LDA Model , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[21]  David E. Pritchard,et al.  Studying Learning in the Worldwide Classroom Research into edX's First MOOC. , 2013 .

[22]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.