Interpretable Multiview Early Warning System Adapted to Underrepresented Student Populations

Early warning systems have been progressively implemented in higher education institutions to predict student performance. However, they usually fail at effectively integrating the many information sources available at universities to make more accurate and timely predictions, they often lack decision-making reasoning to motivate the reasons behind the predictions, and they are generally biased toward the general student body, ignoring the idiosyncrasies of underrepresented student populations (determined by socio-demographic factors such as race, gender, residency, or status as a freshmen, transfer, adult, or first-generation students) that traditionally have greater difficulties and performance gaps. This paper presents a multiview early warning system built with comprehensible Genetic Programming classification rules adapted to specifically target underrepresented and underperforming student populations. The system integrates many student information repositories using multiview learning to improve the accuracy and timing of the predictions. Three interfaces have been developed to provide personalized and aggregated comprehensible feedback to students, instructors, and staff to facilitate early intervention and student support. Experimental results, validated with statistical analysis, indicate that this multiview learning approach outperforms traditional classifiers. Learning outcomes will help instructors and policy-makers to deploy strategies to increase retention and improve academics.

[1]  Mung Chiang,et al.  Early Detection Prediction of Learning Outcomes in Online Short-Courses via Learning Behaviors , 2019, IEEE Transactions on Learning Technologies.

[2]  Sebastián Ventura,et al.  Multi-view semi-supervised learning using genetic programming interpretable classification rules , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[3]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[4]  Nigel Bosch,et al.  Modeling Key Differences in Underrepresented Students' Interactions with an Online STEM Course , 2018, APAScience.

[5]  Sidney D'Mello,et al.  Data mining and education. , 2015, Wiley interdisciplinary reviews. Cognitive science.

[6]  Gi Woong Choi,et al.  Understanding MOOC students: motivations and behaviours indicative of MOOC completion , 2016, J. Comput. Assist. Learn..

[7]  David Gunning,et al.  DARPA's explainable artificial intelligence (XAI) program , 2019, IUI.

[8]  George S. Chen,et al.  Linking early alert systems and student retention: a survival analysis approach , 2018 .

[9]  Rui Guo,et al.  Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory , 2015, Comput. Hum. Behav..

[10]  Eitel J. M. Lauría,et al.  Open academic early alert system: technical demonstration , 2014, LAK '14.

[11]  Gary E. Birch,et al.  Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[12]  Jacob Whitehill,et al.  MOOC Dropout Prediction: How to Measure Accuracy? , 2017, L@S.

[13]  George Karypis,et al.  Predicting Student Performance Using Personalized Analytics , 2016, Computer.

[14]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[15]  Leonardo Vanneschi,et al.  Open issues in genetic programming , 2010, Genetic Programming and Evolvable Machines.

[16]  Wanli Xing,et al.  Dropout Prediction in MOOCs: Using Deep Learning for Personalized Intervention , 2019 .

[17]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[18]  Linda Corrin,et al.  Predicting success: how learners' prior knowledge, skills and activities predict MOOC performance , 2015, LAK.

[19]  Alan F. Smeaton,et al.  Targeting At-risk Students Using Engagement and Effort Predictors in an Introductory Computer Programming Course , 2017, EC-TEL.

[20]  Cristóbal Romero,et al.  A survey on educational process mining , 2018, WIREs Data Mining Knowl. Discov..

[21]  David Azcona,et al.  Micro-analytics for Student Performance Prediction Leveraging fine-grained learning analytics to predict performance , 2015 .

[22]  Dragan Gasevic,et al.  Learning analytics in higher education --- challenges and policies: a review of eight learning analytics policies , 2017, LAK.

[23]  Sebastián Ventura,et al.  Association rule mining using genetic programming to provide feedback to instructors from multiple‐choice quiz data , 2012, Expert Syst. J. Knowl. Eng..

[24]  Shiliang Sun,et al.  Multi-view learning overview: Recent progress and new challenges , 2017, Inf. Fusion.

[25]  Sebastián Ventura,et al.  An evolutionary algorithm for the discovery of rare class association rules in learning management systems , 2014, Applied Intelligence.

[26]  Fangzhao Wu,et al.  Domain-specific sentiment classification via fusing sentiment knowledge from multiple sources , 2017, Inf. Fusion.

[27]  Gérard Lassibille Student Progress in Higher Education: What We Have Learned from Large-Scale Studies , 2011 .

[28]  Shane Dawson,et al.  Mining LMS data to develop an "early warning system" for educators: A proof of concept , 2010, Comput. Educ..

[29]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[30]  Carlos Delgado Kloos,et al.  Prediction in MOOCs: A Review and Future Research Directions , 2019, IEEE Transactions on Learning Technologies.

[31]  Plamen P. Angelov,et al.  Handling drifts and shifts in on-line data streams with evolving fuzzy systems , 2011, Appl. Soft Comput..

[32]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[33]  Alan F. Smeaton,et al.  Innovative learning analytics research at a data-driven HEI , 2017 .

[34]  Anjeela D. Jokhan,et al.  Early warning system as a predictor for student performance in higher education blended courses , 2019 .

[35]  Marie Bienkowski,et al.  Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief , 2012 .

[36]  Shane Dawson,et al.  Identifying key factors of student academic performance by subgroup discovery , 2018, International Journal of Data Science and Analytics.

[37]  Sebastián Ventura,et al.  Knowledge Discovery with Genetic Programming for Providing Feedback to Courseware Authors , 2004, User Modeling and User-Adapted Interaction.

[38]  Dragan Gasevic,et al.  Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success , 2016, Internet High. Educ..

[39]  Rebecca Ferguson,et al.  Learning analytics: drivers, developments and challenges , 2012 .

[40]  Habib Fardoun,et al.  Early dropout prediction using data mining: a case study with high school students , 2016, Expert Syst. J. Knowl. Eng..

[41]  Rianne Conijn,et al.  Predicting Student Performance from LMS Data: A Comparison of 17 Blended Courses Using Moodle LMS , 2017, IEEE Transactions on Learning Technologies.

[42]  Alan F. Smeaton,et al.  Using Educational Analytics to Improve Test Performance , 2015, EC-TEL.

[43]  Sebastián Ventura,et al.  Predicting Academic Achievement Using Multiple Instance Genetic Programming , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[44]  S. Paunonen,et al.  Big Five personality predictors of post-secondary academic performance , 2007 .

[45]  Farshid Marbouti,et al.  Models for early prediction of at-risk students in a course using standards-based grading , 2016, Comput. Educ..

[46]  Alan F. Smeaton,et al.  Predictive modelling of student reviewing behaviors in an introductory programming course , 2018 .

[47]  Sebastián Ventura,et al.  Educational data science in massive open online courses , 2016, WIREs Data Mining Knowl. Discov..

[48]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[49]  Mykola Pechenizkiy,et al.  Handbook of Educational Data Mining , 2010 .

[50]  Sebastián Ventura,et al.  An interpretable classification rule mining algorithm , 2013, Inf. Sci..

[51]  Di Xu,et al.  How do online course design features influence student performance? , 2016, Comput. Educ..

[52]  Anjana Pradeep,et al.  Students dropout factor prediction using EDM techniques , 2015, 2015 International Conference on Soft-Computing and Networks Security (ICSNS).

[53]  Ian H. Witten,et al.  Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques , 2016 .

[54]  Maria Meehan,et al.  Contrasting prediction methods for early warning systems at undergraduate level , 2016, Internet High. Educ..

[55]  Erik Duval,et al.  Context-Aware Recommender Systems for Learning: A Survey and Future Challenges , 2012, IEEE Transactions on Learning Technologies.

[56]  Nadine Meskens,et al.  Predicting Academic Performance by Data Mining Methods , 2007 .

[57]  Chia-Lun Lo,et al.  Developing early warning systems to predict students' online learning performance , 2014, Comput. Hum. Behav..

[58]  Sebastián Ventura,et al.  A classification module for genetic programming algorithms in JCLEC , 2015, J. Mach. Learn. Res..

[59]  Ryan Shaun Joazeiro de Baker,et al.  Educational Data Mining: An Advance for Intelligent Systems in Education , 2014, IEEE Intelligent Systems.

[60]  Jared E Knowles,et al.  Of Needles and Haystacks: Building an Accurate Statewide Dropout Early Warning System in Wisconsin , 2015, EDM 2015.

[61]  Wu-Yuin Hwang,et al.  Implementing On-Call-Tutor System for Facilitating Peer-Help Activities , 2019, IEEE Transactions on Learning Technologies.

[62]  Nikhil R. Pal,et al.  A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classification , 2016, IEEE Transactions on Cybernetics.

[63]  Ryan S. Baker,et al.  Educational Data Mining and Learning Analytics , 2014 .

[64]  Alan F. Smeaton,et al.  An Exploratory Study on Student Engagement with Adaptive Notifications in Programming Courses , 2018, EC-TEL.

[65]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[66]  Mark Johnston,et al.  Genetic Programming for Classification with Unbalanced Data , 2010, EuroGP.

[67]  Carlos Márquez-Vera,et al.  Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data , 2013, Applied Intelligence.

[68]  Alex Singleton,et al.  Predicting students' academic performance based on school and socio-demographic characteristics , 2016 .

[69]  Edwin Lughofer,et al.  Resolving global and local drifts in data stream regression using evolving rule-based models , 2013, 2013 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[70]  Malcolm I. Heywood,et al.  On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets , 2016, EuroGP.

[71]  Alberto Cano,et al.  An ensemble approach to multi-view multi-instance learning , 2017, Knowl. Based Syst..

[72]  Ji Won You,et al.  Identifying significant indicators using LMS data to predict course achievement in online learning , 2016, Internet High. Educ..

[73]  Zdenek Zdráhal,et al.  Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment , 2013, LAK '13.

[74]  Omkar S. Patil,et al.  Predicting Dropout Students Using Data Mining Techniques , 2015 .

[75]  Sebastián Ventura,et al.  Rule discovery in W eb-based educational systems using Grammar-Based Genetic Programming , 2005 .