Predicting high-risk students using Internet access logs

Predicting student performance (PSP) is of great use from an educational perspective, especially for high-risk students who need timely help to complete their studies. Previous PSP studies construct prediction models mainly on data collected from questionnaires or some specific learning systems. Instead, students’ Internet access logs were used in this study to predict high-risk students. Since the raw data in log files are high-dimensional, complex and full of noise, several methods were proposed for the preprocessing of the data source. A high-dimensional feature selection framework is then designed to prepare features for the construction of a prediction model with good trade-off between computational efficiency and prediction performance. Experiments showed that the proposed prediction model can identify about 85% of high-risk students. Some online characteristics of high-risk students were also discovered, which might help student counselors and educational researchers better understand the relationship between students’ Internet use and their academic performance.

[1]  Fotini Paraskeva,et al.  Digital games: Developing the Issues of Socio-cognitive Learning Theory in an Attempt to Shift an Entertainment Gadget to an Educational Tool , 2007, 2007 First IEEE International Workshop on Digital Game and Intelligent Toy Enhanced Learning (DIGITEL'07).

[2]  Belén Melián-Batista,et al.  High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach , 2016, Inf. Sci..

[3]  Diego Andina,et al.  Breast Cancer Classification Applying Artificial Metaplasticity , 2009, IWINAC.

[4]  Youjie Zheng,et al.  Predicting student performances from access records on general websites , 2015 .

[5]  Steve Joordens,et al.  Assessing the effectiveness of a voluntary online discussion forum on improving students' course performance , 2011, Comput. Educ..

[6]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[7]  R. Arteaga Sánchez,et al.  Students' perceptions of Facebook for academic purposes , 2014, Comput. Educ..

[8]  Sotiris B. Kotsiantis,et al.  A combinational incremental ensemble of classifiers as a technique for predicting students' performance in distance education , 2010, Knowl. Based Syst..

[9]  Nadia Abd-Alsabour,et al.  A Review on Evolutionary Feature Selection , 2014, 2014 European Modelling Symposium.

[10]  Sebastián Ventura,et al.  DRAL: a tool for discovering relevant e-activities for learners , 2012, Knowledge and Information Systems.

[11]  Gary Grudnitski,et al.  A FORECAST OF ACHIEVEMENT FROM STUDENT PROFILE DATA , 1997 .

[12]  Sebastián Ventura,et al.  Web usage mining for predicting final marks of students that use Moodle courses , 2013, Comput. Appl. Eng. Educ..

[13]  K. Young Internet Addiction , 2004 .

[14]  Ryen W. White,et al.  Lessons from the journey: a query log analysis of within-session learning , 2014, WSDM.

[15]  Alejandro Peña Ayala,et al.  Educational data mining: A survey and a data mining-based analysis of recent works , 2014, Expert Syst. Appl..

[16]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[17]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[18]  Sebastián Ventura,et al.  Data mining in education , 2013, WIREs Data Mining Knowl. Discov..

[19]  César Hervás-Martínez,et al.  Data Mining Algorithms to Classify Students , 2008, EDM.

[20]  John R. Barrows,et al.  Internet Use and Collegiate Academic Performance Decrements: Early Findings , 2001 .

[21]  S. R. Ting,et al.  Predicting academic success of first-year engineering students from standardized test scores and psychosocial variables , 2001 .

[22]  Alejandro Peña-Ayala Review: Educational data mining: A survey and a data mining-based analysis of recent works , 2014 .

[23]  Alberto Salguero,et al.  Factors influencing university drop out rates , 2009, Comput. Educ..

[24]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[25]  W. F. Punch,et al.  Predicting student performance: an application of data mining methods with an educational Web-based system , 2003, 33rd Annual Frontiers in Education, 2003. FIE 2003..

[26]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[27]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[28]  Sebastián Ventura,et al.  Data mining in course management systems: Moodle case study and tutorial , 2008, Comput. Educ..

[29]  Wilhelmiina Hämäläinen,et al.  Classifiers for educational data mining , 2010 .

[30]  Lubos Popelínský,et al.  Predicting drop-out from social behaviour of students , 2012, EDM.

[31]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[32]  José Luis Ortega,et al.  Differences between web sessions according to the origin of their visits , 2010, J. Informetrics.

[33]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[34]  Nadine Meskens,et al.  Predicting Academic Performance by Data Mining Methods , 2007 .

[35]  Sawyer A. Hunley,et al.  Adolescent computer use and academic achievement. , 2005, Adolescence.

[36]  Joel Quintanilla-Domínguez,et al.  Breast cancer classification applying artificial metaplasticity algorithm , 2011, Neurocomputing.

[37]  Edward J. Maloney What Web 2.0 Can Teach Us about Learning. , 2007 .

[38]  Masoumeh Alavi,et al.  Relationship between Internet Addiction and Academic Performance among Foreign Undergraduate Students , 2014 .

[39]  Lars Schmidt-Thieme,et al.  Recommender system for predicting student performance , 2010, RecSysTEL@RecSys.

[40]  Sue Bennett,et al.  The 'digital natives' debate: A critical review of the evidence , 2008, Br. J. Educ. Technol..

[41]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[42]  Michalis Nik Xenos Prediction and assessment of student behaviour in open and distance education in computers using Bayesian networks , 2004, Comput. Educ..

[43]  Mihaela Cocea,et al.  Log file analysis for disengagement detection in e-Learning environments , 2009, User Modeling and User-Adapted Interaction.

[44]  David Rutledge,et al.  Digital learners and the overlapping of their personal and educational digital engagement , 2014, Comput. Educ..

[45]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Marek Sikora,et al.  Induction and pruning of classification rules for prediction of microseismic hazards in coal mines , 2011, Expert Syst. Appl..

[47]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[48]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[49]  Mykola Pechenizkiy,et al.  Predicting Students Drop Out: A Case Study , 2009, EDM.