Tracing Forum Posts to MOOC Content using Topic Analysis

Massive Open Online Courses are educational programs that are open and accessible to a large number of people through the internet. To facilitate learning, MOOC discussion forums exist where students and instructors communicate questions, answers, and thoughts related to the course. The primary objective of this paper is to investigate tracing discussion forum posts back to course lecture videos and readings using topic analysis. We utilize both unsupervised and supervised variants of Latent Dirichlet Allocation (LDA) to extract topics from course material and classify forum posts. We validate our approach on posts bootstrapped from five Coursera courses and determine that topic models can be used to map student discussion posts back to the underlying course lecture or reading. Labeled LDA outperforms unsupervised Hierarchical Dirichlet Process LDA and base LDA for our traceability task. This research is useful as it provides an automated approach for clustering student discussions by course material, enabling instructors to quickly evaluate student misunderstanding of content and clarify materials accordingly.

[1]  Anirban Dasgupta,et al.  Superposter behavior in MOOC forums , 2014, L@S.

[2]  Jane Sinclair,et al.  Exploring the use of MOOC discussion forums , 2014 .

[3]  Roy Williams,et al.  The ideals and reality of participating in a MOOC , 2010 .

[4]  Lise Getoor,et al.  Understanding MOOC Discussion Forums using Seeded LDA , 2014, BEA@ACL.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  David E. Pritchard,et al.  Studying Learning in the Worldwide Classroom Research into edX's First MOOC. , 2013 .

[7]  Kristy Elizabeth Boyer,et al.  Unsupervised modeling for understanding MOOC discussion forums: a learning analytics approach , 2015, LAK.

[8]  Jignesh M. Patel,et al.  Estimating the selectivity of tf-idf based cosine similarity predicates , 2007, SGMD.

[9]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[10]  Thushari Atapattu,et al.  Topic-wise Classification of MOOC Discussions: A Visual Analytics Approach , 2016, EDM.

[11]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[12]  Abram Hindle,et al.  Relating requirements to implementation via topic analysis: Do topics extracted from requirements make sense to managers and developers? , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[13]  Armando Fox,et al.  Monitoring MOOCs: which information sources do instructors value? , 2014, L@S.

[14]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[15]  Mordechai Nisenson,et al.  A Traceability Technique for Specifications , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[16]  Carolyn Penstein Rosé,et al.  Sentiment Analysis in MOOC Discussion Forums: What does it tell us? , 2014, EDM.

[17]  Björn Hartmann,et al.  Should your MOOC forum use a reputation system? , 2014, CSCW.

[18]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[19]  Xuewei Zhang,et al.  Topic modeling for evaluating students' reflective writing: a case study of pre-service teachers' journals , 2016, LAK.

[20]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21]  Bernard J. Jansen,et al.  An Analysis of MOOC Discussion Forum Interactions from the Most Active Users , 2015, SBP.

[22]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.