MOOCCube: A Large-scale Data Repository for NLP Applications in MOOCs

The prosperity of Massive Open Online Courses (MOOCs) provides fodder for many NLP and AI research for education applications, e.g., course concept extraction, prerequisite relation discovery, etc. However, the publicly available datasets of MOOC are limited in size with few types of data, which hinders advanced models and novel attempts in related topics. Therefore, we present MOOCCube, a large-scale data repository of over 700 MOOC courses, 100k concepts, 8 million student behaviors with an external resource. Moreover, we conduct a prerequisite discovery task as an example application to show the potential of MOOCCube in facilitating relevant research. The data repository is now available at http://moocdata.cn/data/MOOCCube.

[1]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[2]  Juan-Zi Li,et al.  Course Concept Expansion in MOOCs with External Knowledge and Interactive Game , 2019, ACL.

[3]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[4]  Jie Tang,et al.  Understanding Dropouts in MOOCs , 2019, AAAI.

[5]  Dragomir R. Radev,et al.  What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning , 2018, AAAI.

[6]  Chengjiang Li,et al.  Prerequisite Relation Learning for Concepts in MOOCs , 2017, ACL.

[7]  Stephen Aguilar,et al.  Structured Generation of Technical Reading Lists , 2017, BEA@EMNLP.

[8]  Vaibhav Rajan,et al.  Course Corpus 1 ( e . g , MIT course web pages ) Course Corpus 2 MOOC Corpus ( video sequences ) Concept Space Neural Net overfittingPartial derivative Lin , 2019 .

[9]  Chengjiang Li,et al.  Course Concept Extraction in MOOCs via Embedding-Based Graph Propagation , 2017, IJCNLP.

[10]  Dragomir R. Radev,et al.  The ACL anthology network corpus , 2009, Language Resources and Evaluation.

[11]  Dragomir R. Radev,et al.  TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation , 2018, ACL.

[12]  Jimeng Sun,et al.  Hierarchical Reinforcement Learning for Course Recommendation in MOOCs , 2019, AAAI.

[13]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[14]  Thierry Volery,et al.  Critical success factors in online education , 2000 .

[15]  Xu Chen,et al.  Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding , 2017, ACL.

[16]  Maosong Sun,et al.  Smart Jump: Automated Navigation Suggestion for Videos in MOOCs , 2017, WWW.

[17]  Wenyi Huang,et al.  Measuring Prerequisite Relations Among Concepts , 2015, EMNLP.

[18]  Yixin Cao,et al.  Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences , 2019, WWW.

[19]  Priti Kadam,et al.  KDD CUP 2015- Predicting Dropouts in MOOC’S , 2016 .

[20]  Premkumar Natarajan,et al.  Modeling Concept Dependencies in a Scientific Corpus , 2016, ACL.