A data mining approach to guide students through the enrollment process based on academic performance

Student academic performance at universities is crucial for education management systems. Many actions and decisions are made based on it, specifically the enrollment process. During enrollment, students have to decide which courses to sign up for. This research presents the rationale behind the design of a recommender system to support the enrollment process using the students’ academic performance record. To build this system, the CRISP-DM methodology was applied to data from students of the Computer Science Department at University of Lima, Perú. One of the main contributions of this work is the use of two synthetic attributes to improve the relevance of the recommendations made. The first attribute estimates the inherent difficulty of a given course. The second attribute, named potential, is a measure of the competence of a student for a given course based on the grades obtained in related courses. Data was mined using C4.5, KNN (K-nearest neighbor), Naïve Bayes, Bagging and Boosting, and a set of experiments was developed in order to determine the best algorithm for this application domain. Results indicate that Bagging is the best method regarding predictive accuracy. Based on these results, the “Student Performance Recommender System” (SPRS) was developed, including a learning engine. SPRS was tested with a sample group of 39 students during the enrollment process. Results showed that the system had a very good performance under real-life conditions.

[1]  Cali M. Davis,et al.  Data Mining Applications in Higher Education , 2007 .

[2]  Xindong Wu,et al.  The Top Ten Algorithms in Data Mining , 2009 .

[3]  Ronen Feldman,et al.  Mining the biomedical literature using semantic analysis and natural language processing techniques , 2003 .

[4]  Jiawei Han How Can Data Mining Help Bio-Data Analysis? , 2002, BIOKDD.

[5]  Qasem A. Al-Radaideh,et al.  Mining Student Data Using Decision Trees , 2006 .

[6]  Sung C. Choi,et al.  Choice of the smoothing parameter and efficiency of k-nearest neighbor classification , 1986 .

[7]  Ivan Bratko,et al.  On Estimating Probabilities in Tree Pruning , 1991, EWSL.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  J. Luan,et al.  Data Mining and Knowledge Management: A System Analysis for Establishing a Tiered Knowledge Management Model. , 2000 .

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[14]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[15]  R. Bhaskaran,et al.  A CHAID Based Performance Prediction Model in Educational Data Mining , 2010, ArXiv.

[16]  Jing Luan,et al.  Data Mining and Knowledge Management in Higher Education -Potential Applications. , 2002 .

[17]  Osmar R. Zaïane,et al.  Building a Recommender Agent for e-Learning Systems , 2002, ICCE.

[18]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[19]  John Mingers,et al.  Expert Systems—Rule Induction with Statistical Data , 1987 .

[20]  Luis Martínez,et al.  Orieb, A Crs For Academic Orientation Using Qualitative Assessments , 2008, e-Learning.

[21]  Alvaro Ortigosa,et al.  Recommendation in Higher Education Using Data Mining Techniques , 2009, EDM.

[22]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[23]  Mykola Pechenizkiy,et al.  Predicting Students Drop Out: A Case Study , 2009, EDM.

[24]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[25]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[26]  Herb Edelstein,et al.  Building profitable customer relationships with data mining , 2000 .

[27]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[28]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.

[29]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[30]  Jaideep Srivastava,et al.  Web Mining: Pattern Discovery from World Wide Web Transactions , 1996 .

[31]  J. Schafer,et al.  The Application of Data-Mining to Recommender Systems , 2009, Encyclopedia of Data Warehousing and Mining.

[32]  Hannu Toivonen,et al.  Proceedings of the 2nd ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD 2002), July 23rd, 2002, Edmonton, Alberta, Canada , 2002, BIOKDD.

[33]  Alvaro Ortigosa,et al.  A Case Study: Data Mining Applied to Student Enrollment , 2010, EDM.

[34]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).