Multiview Learning for Early Prognosis of Academic Performance: A Case Study

Educational data mining has gained a lot of attention among scientists in recent years and constitutes an efficient tool for unraveling the concealed knowledge in educational data. Recently, semisupervised learning methods have been gradually implemented in the educational process demonstrating their usability and effectiveness. Cotraining is a representative semisupervised method aiming to exploit both labeled and unlabeled examples, provided that each example is described by two features views. Nevertheless, it is yet to be used in various scientific fields, among which the educational field as well, since the assumption about the existence of two feature views cannot be easily put into practice. Within this context, the main purpose of this study is to evaluate the efficiency of a proposed cotraining method for early prognosis of undergraduate students’ performance in the final examinations of a distance course based on a plethora of attributes which are naturally divided into two distinct views, since they are originated from different sources. More specifically, the first view consists of attributes regarding students’ characteristics and academic achievements which are manually filled out by their tutors, whereas the second one consists of attributes tracking students’ online activity in the course learning management system and which are automatically recorded by the system. The experimental results demonstrate the superiority of the proposed cotraining method as opposed to state-of-the-art semisupervised and supervised methods.

[1]  Yan Zhang,et al.  MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content , 2017, DASFAA Workshops.

[2]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[3]  Jun Du,et al.  When Does Cotraining Work in Real Data? , 2011, IEEE Transactions on Knowledge and Data Engineering.

[4]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[5]  Alejandro Peña-Ayala,et al.  Educational data mining , 2014 .

[6]  Venu Govindaraju,et al.  Improved k-nearest neighbor classification , 2002, Pattern Recognit..

[7]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[8]  Shumin Jing Automatic Grading of Short Answers for MOOC via Semi-supervised Document Clustering , 2015, EDM.

[9]  George Siemens,et al.  Current state and future trends: a citation network analysis of the learning analytics field , 2014, LAK.

[10]  Anastasios A. Economides,et al.  Learning Analytics and Educational Data Mining in Practice: A Systematic Literature Review of Empirical Evidence , 2014, J. Educ. Technol. Soc..

[11]  Yan Zhou,et al.  Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[12]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[13]  Vo Thi Ngoc Chau,et al.  Combining transfer learning and co-training for student classification in an academic credit system , 2016, 2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF).

[14]  J. L. Hodges,et al.  Rank Methods for Combination of Independent Experiments in Analysis of Variance , 1962 .

[15]  Vassilis Loumos,et al.  Dropout prediction in e-learning courses through the combination of machine learning techniques , 2009, Comput. Educ..

[16]  Zlatko J. Kovacic,et al.  Early Prediction of Student Success: Mining Students Enrolment Data , 2010 .

[17]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[18]  Vo Thi Ngoc Chau,et al.  A robust random forest-based tri-training algorithm for early in-trouble student prediction , 2017, 2017 4th NAFOSTED Conference on Information and Computer Science.

[19]  Peter Brusilovsky,et al.  Semi-Supervised Techniques for Mining Learning Outcomes and Prerequisites , 2017, KDD.

[20]  Vo Thi Ngoc Chau,et al.  On Semi-supervised Learning with Sparse Data Handling for Educational Data Classification , 2017, FDSE.

[21]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Sotiris B. Kotsiantis,et al.  Predicting Student Performance in Distance Higher Education Using Semi-supervised Techniques , 2015, MEDI.

[23]  John P. Campbell,et al.  Academic Analytics: A New Tool for a New Era. , 2007 .

[24]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[25]  Sebastián Ventura,et al.  Data mining in education , 2013, WIREs Data Mining Knowl. Discov..

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Yujian Li,et al.  Unlabeled PCA-shuffling initialization for convolutional neural networks , 2018, Applied Intelligence.

[28]  Georgios Kostopoulos,et al.  Semi-supervised regression: A recent review , 2018, J. Intell. Fuzzy Syst..

[29]  Richard J. Roiger,et al.  Data Mining: A Tutorial Based Primer , 2002 .

[30]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[31]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[32]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[33]  Han Su,et al.  Predicting Academic Performance via Semi-supervised Learning with Constructed Campus Social Network , 2017, DASFAA.

[34]  Ji Won You,et al.  Examining the Effect of Academic Procrastination on Achievement Using LMS Data in e-Learning , 2015, J. Educ. Technol. Soc..

[35]  A. Chickering,et al.  Seven Principles for Good Practice in Undergraduate Education , 1987, CORE.

[36]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[37]  Brian Mac Namee,et al.  Active learning for text classification with reusability , 2016, Expert Syst. Appl..

[38]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[39]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[40]  Carlos Delgado Kloos,et al.  Prediction in MOOCs: A Review and Future Research Directions , 2019, IEEE Transactions on Learning Technologies.

[41]  Louis C. Pugliese,et al.  Action Analytics: Measuring and Improving Performance that Matters in Higher Education. , 2008 .

[42]  Vo Thi Ngoc Chau,et al.  A Random Forest-Based Self-training Algorithm for Study Status Prediction at the Program Level: minSemi-RF , 2016, MIWAI.

[43]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[44]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[45]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[46]  Zhi-Hua Zhou,et al.  Analyzing Co-training Style Algorithms , 2007, ECML.

[47]  Haruna Chiroma,et al.  Data Mining for Education Decision Support: A Review , 2014, Int. J. Emerg. Technol. Learn..

[48]  Rayid Ghani,et al.  Combining Labeled and Unlabeled Data for MultiClass Text Categorization , 2002, ICML.

[49]  Sebastián Ventura,et al.  Predicting students' final performance from participation in on-line discussion forums , 2013, Comput. Educ..

[50]  Wilhelmiina Hämäläinen,et al.  Classifiers for educational data mining , 2010 .

[51]  Rianne Conijn,et al.  Predicting Student Performance from LMS Data: A Comparison of 17 Blended Courses Using Moodle LMS , 2017, IEEE Transactions on Learning Technologies.

[52]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[53]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[54]  Jonathan P. Rowe,et al.  Leveraging Semi-Supervised Learning to Predict Student Problem-Solving Performance in Narrative-Centered Learning Environments , 2014, Intelligent Tutoring Systems.

[55]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[56]  George Siemens,et al.  Penetrating the fog: analytics in learning and education , 2014 .

[57]  Eitel J. M. Lauría,et al.  Early Alert of Academically At-Risk Students: An Open Source Analytics Initiative , 2014, J. Learn. Anal..

[58]  Vera L. Miguéis,et al.  Educational data mining: A literature review , 2018, 2018 13th Iberian Conference on Information Systems and Technologies (CISTI).

[59]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[60]  Zongkai Yang,et al.  Adaptive multi-view selection for semi-supervised emotion recognition of posts in online student community , 2014, Neurocomputing.

[61]  Judy Sheard Basics of Statistical Analysis of Interactions Data from Web-Based Learning Environments , 2010 .

[62]  Charles Elkan,et al.  Optimal Thresholding of Classifiers to Maximize F1 Measure , 2014, ECML/PKDD.

[63]  Sotiris B. Kotsiantis,et al.  Estimating student dropout in distance higher education using semi-supervised techniques , 2015, Panhellenic Conference on Informatics.