Combining click-stream data with NLP tools to better understand MOOC completion

Completion rates for massive open online classes (MOOCs) are notoriously low. Identifying student patterns related to course completion may help to develop interventions that can improve retention and learning outcomes in MOOCs. Previous research predicting MOOC completion has focused on click-stream data, student demographics, and natural language processing (NLP) analyses. However, most of these analyses have not taken full advantage of the multiple types of data available. This study combines click-stream data and NLP approaches to examine if students' on-line activity and the language they produce in the online discussion forum is predictive of successful class completion. We study this analysis in the context of a subsample of 320 students who completed at least one graded assignment and produced at least 50 words in discussion forums, in a MOOC on educational data mining. The findings indicate that a mix of click-stream data and NLP indices can predict with substantial accuracy (78%) whether students complete the MOOC. This predictive power suggests that student interaction data and language data within a MOOC can help us both to understand student retention in MOOCs and to develop automated signals of student success.

[1]  Carolyn Penstein Rosé,et al.  Sentiment Analysis in MOOC Discussion Forums: What does it tell us? , 2014, EDM.

[2]  Kalyan Veeramachaneni,et al.  Likely to stop? Predicting Stopout in Massive Open Online Courses , 2014, ArXiv.

[3]  Stefan Trausan-Matu,et al.  Reflecting Comprehension through French Textual Complexity Factors , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[4]  Danielle S McNamara,et al.  The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion , 2015, Behavior Research Methods.

[5]  Stefan Trausan-Matu,et al.  Mining Texts, Learner Productions and Strategies with ReaderBench , 2014 .

[6]  M. Bradley,et al.  Affective Normsfor English Words (ANEW): Stimuli, instruction manual and affective ratings (Tech Report C-1) , 1999 .

[7]  Seungwhan Moon,et al.  Identifying Student Leaders from MOOC Discussion Forums through Language Influence , 2014, EMNLP 2014.

[8]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[9]  Chris Piech,et al.  Deconstructing disengagement: analyzing learner subpopulations in massive open online courses , 2013, LAK '13.

[10]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[11]  Danielle S McNamara,et al.  Natural language processing in an intelligent writing strategy tutoring system , 2012, Behavior Research Methods.

[12]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[13]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[14]  Danielle S. McNamara,et al.  ReaderBench: Automated evaluation of collaboration based on cohesion and dialogism , 2015, International Journal of Computer-Supported Collaborative Learning.

[15]  Carolyn Penstein Rosé,et al.  Linguistic Reflections of Student Engagement in Massive Open Online Courses , 2014, ICWSM.

[16]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[17]  Dan Goldwasser,et al.  Predicting Instructor’s Intervention in MOOC forums , 2014, ACL.

[18]  Luc Paquette,et al.  A Longitudinal Study on Learner Career Advancement in MOOCs , 2014, J. Learn. Anal..

[19]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[20]  Andrew D. Ho,et al.  Changing “Course”: Reconceptualizing Educational Variables for Massive Open Online Courses , 2014 .

[21]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[22]  Sherif A. Halawa,et al.  Dropout Prediction in MOOCs using Learner Activity Features , 2014 .

[23]  Yoav Bergner,et al.  Who does what in a massive open online course? , 2014, Commun. ACM.

[24]  Patrick Jermann,et al.  Identifying Styles and Paths toward Success in MOOCs , 2015, EDM.

[25]  Scott A. Crossley,et al.  Advancing research in second language writing through computational tools and machine learning techniques: A research agenda , 2013, Language Teaching.

[26]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[27]  Scott A. Crossley,et al.  Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application , 2015 .

[28]  Vassilis Loumos,et al.  Dropout prediction in e-learning courses through the combination of machine learning techniques , 2009, Comput. Educ..

[29]  Lise Getoor,et al.  Understanding MOOC Discussion Forums using Seeded LDA , 2014, BEA@ACL.

[30]  Yuan Wang,et al.  MOOC Leaner Motivation and Learning Pattern Discovery , 2014, EDM.

[31]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[32]  Erik Cambria,et al.  Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis , 2015 .

[33]  Erik Cambria,et al.  SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis , 2012, FLAIRS.

[34]  James Bailey,et al.  Identifying At-Risk Students in Massive Open Online Courses , 2015, AAAI.

[35]  Kalyan Veeramachaneni,et al.  Transfer Learning for Predictive Models in Massive Open Online Courses , 2015, AIED.

[36]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[37]  Mihai Dascalu,et al.  Analyzing Discourse and Text Complexity for Learning and Collaborating - A Cognitive Approach Based on Natural Language Processing , 2013, Studies in Computational Intelligence.

[38]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[39]  Danielle S. McNamara,et al.  Discourse cohesion, a signature of collaboration , 2019 .

[40]  Erik Cambria,et al.  SenticNet: A Publicly Available Semantic Resource for Opinion Mining , 2010, AAAI Fall Symposium: Commonsense Knowledge.

[41]  Eitel J. M. Lauría,et al.  Mining academic data to improve college student retention: an open source perspective , 2012, LAK.

[42]  Joseph Jay Williams,et al.  Beyond Prediction: Towards Automatic Intervention in MOOC Student Stop-out , 2015, EDM.

[43]  Danielle S. McNamara,et al.  Language to Completion: Success in an Educational Data Mining Massive Open Online Class , 2015, EDM.

[44]  Girish Balakrishnan,et al.  Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models , 2013 .

[45]  Noureddine Elouazizi Point-of-View Mining and Cognitive Presence in MOOCs: A (Computational) Linguistics Perspective , 2014, EMNLP 2014.