Mining MOOC Lecture Transcripts to Construct Concept Dependency Graphs

This paper addresses the question of identifying a concept dependency graph for a MOOC through unsupervised analysis of lecture transcripts. The problem is important: extracting a concept graph is the first step in helping students with varying preparation to understand course material. The problem is challenging: instructors are unaware of the student preparation diversity and may be unable to identify the right resolution of the concepts, necessitating costly updates; inferring concepts from groups suffers from polysemy; the temporal order of concepts depends on the concepts in question. We propose innovative unsupervised methods to discover a directed concept dependency within and between lectures. Our main technical innovation lies in exploiting the temporal ordering amongst concepts to discover the graph. We propose two measures—the Bridge Ensemble Measure and the Global Direction Measure—to infer the existence and the direction of the dependency relations between concepts. The bridge ensemble measure identifies concept overlap between lectures, determines concept co-occurrence within short windows, and the lecture where concepts occur first. The global direction measure incorporates time directly by analyzing the concept time ordering both globally and within lectures. Experiments over real-world MOOC data show that our method outperforms the baseline in both AUC and precision/recall curves.

[1]  Qinghua Zheng,et al.  Mining learning-dependency between knowledge units from text , 2011, The VLDB Journal.

[2]  Jiawei Han,et al.  FacetGist: Collective Extraction of Document Facets in Large Technical Corpora , 2016, CIKM.

[3]  Wenyi Huang,et al.  Measuring Prerequisite Relations Among Concepts , 2015, EMNLP.

[4]  William W. Cohen,et al.  Crowdsourced Comprehension: Predicting Prerequisite Structure in Wikipedia , 2012, BEA@NAACL-HLT.

[5]  Rakesh Agrawal,et al.  Toward Data-Driven Design of Educational Courses: A Feasibility Study , 2016, EDM.

[6]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Ryen W. White,et al.  A study of topic similarity measures , 2004, SIGIR '04.

[8]  Premkumar Natarajan,et al.  Modeling Concept Dependencies in a Scientific Corpus , 2016, ACL.

[9]  Zhaohui Wu,et al.  Recovering Concept Prerequisite Relations from University Course Dependencies , 2017, AAAI.

[10]  Yiming Yang,et al.  Data-driven Automated Induction of Prerequisite Structure Graphs , 2016, EDM.

[11]  Yiming Yang,et al.  Concept Graph Learning from Educational Data , 2015, WSDM.

[12]  Clare R. Voss,et al.  Scalable Topical Phrase Mining from Text Corpora , 2014, Proc. VLDB Endow..

[13]  Jiawei Han,et al.  Mining Quality Phrases from Massive Text Corpora , 2015, SIGMOD Conference.

[14]  Vldb Endowment,et al.  The VLDB journal : the international journal on very large data bases. , 1992 .