Web usage mining for predicting final marks of students that use Moodle courses

This paper shows how web usage mining can be applied in e‐learning systems in order to predict the marks that university students will obtain in the final exam of a course. We have also developed a specific Moodle mining tool oriented for the use of not only experts in data mining but also of newcomers like instructors and courseware authors. The performance of different data mining techniques for classifying students are compared, starting with the student's usage data in several Cordoba University Moodle courses in engineering. Several well‐known classification methods have been used, such as statistical methods, decision trees, rule and fuzzy rule induction methods, and neural networks. We have carried out several experiments using all available and filtered data to try to obtain more accuracy. Discretization and rebalance pre‐processing techniques have also been used on the original numerical data to test again if better classifier models can be obtained. Finally, we show examples of some of the models discovered and explain that a classifier model appropriate for an educational environment has to be both accurate and comprehensible in order for instructors and course administrators to be able to use it for decision making. © 2010 Wiley Periodicals, Inc. Comput Appl Eng Educ 21: 135–146, 2013

[1]  Lakhmi C. Jain,et al.  Evolution of Teaching and Learning Paradigms in Intelligent Environment , 2007 .

[2]  Sebastián Ventura,et al.  Data mining in course management systems: Moodle case study and tutorial , 2008, Comput. Educ..

[3]  S. Katebi,et al.  Protein Superfamily Classification Using Fuzzy Rule-Based Classifier , 2009, IEEE Transactions on NanoBioscience.

[4]  Wilhelmiina Hämäläinen,et al.  Comparison of Machine Learning Methods for Intelligent Tutoring Systems , 2006, Intelligent Tutoring Systems.

[5]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[6]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Gilles Venturini,et al.  SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts , 1993, ECML.

[9]  Mihaela Cocea,et al.  Eliciting Motivation Knowledge from Log Files Towards Motivation Diagnosis for Adaptive Systems , 2007, User Modeling.

[10]  Jean-Philippe Vert,et al.  Classification of Biological Sequences with Kernel Methods , 2006, ICGI.

[11]  Jack Mostow,et al.  A Generic Tool to Browse Tutor-Student Interactions: Time Will Tell! , 2005, AIED.

[12]  Paulo J. G. Lisboa,et al.  Learning what is important: feature selection and rule extraction in a virtual course , 2006, ESANN.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  C.J.H. Mann,et al.  Handbook of Data Mining and Knowledge Discovery , 2004 .

[15]  Jack Mostow,et al.  Some useful tactics to modify, map and mine data from intelligent tutors , 2006, Natural Language Engineering.

[16]  William F. Punch,et al.  Using Genetic Algorithms for Data Mining Optimization in an Educational Web-Based System , 2003, GECCO.

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Eva Lucrecia Gibaja Galindo,et al.  Predicting students' marks from Moodle logs using neural network models , 2006 .

[19]  Paul Golding,et al.  Predicting Academic Performance in the School of Computing & Information Technology (SCIT) , 2005, Proceedings Frontiers in Education 35th Annual Conference.

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[21]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[22]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[23]  Rice,et al.  Moodle : E-learning course development : a complete guide to successful learning using Moodle , 2006 .

[24]  Sotiris B. Kotsiantis,et al.  PREDICTING STUDENTS' PERFORMANCE IN DISTANCE LEARNING USING MACHINE LEARNING TECHNIQUES , 2004, Appl. Artif. Intell..

[25]  Jason Cole Using moodle , 2005 .

[26]  Eva Martínez-Caro,et al.  Factors affecting effectiveness in e‐learning: An analysis in production management courses , 2011, Comput. Appl. Eng. Educ..

[27]  Daniel Martinez,et al.  Predicting Student Outcomes Using Discriminant Function Analysis. , 2001 .

[28]  Ronald H. Stevens,et al.  Developing a framework for integrating prior problem solving and knowledge sharing histories of a group to predict future group performance , 2005, 2005 International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[29]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[30]  Gwo-Dong Chen,et al.  Discovering Decision Knowledge from Web Log Portfolio for Managing Classroom Processes by Applying Decision Tree and Data Cube Technology , 2000 .

[31]  Avinash Gandhe,et al.  XCS for Fusing Multi-Spectral Data in Automatic Target Recognition , 2008, Learning Classifier Systems in Data Mining.

[32]  Nadine Meskens,et al.  Determination of factors influencing the achievement of the first-year university students using data mining methods , 2006 .

[33]  Sotiris B. Kotsiantis,et al.  Predicting students marks in Hellenic Open University , 2005, Fifth IEEE International Conference on Advanced Learning Technologies (ICALT'05).

[34]  RomeroC.,et al.  Evolutionary algorithms for subgroup discovery in e-learning , 2009 .

[35]  Manas Ranjan Patra,et al.  Ensembling Rule Based Classifiers for Detecting Network Intrusions , 2009, 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

[36]  Àngela Nebot,et al.  Applying Data Mining Techniques to e-Learning Problems , 2007 .

[37]  Jiang Li A HMM-RBFN hybrid classifier for surface electromyography signals classification , 2006 .

[38]  Osmar R. Zaïane,et al.  Web Usage Mining for a Better Web-Based Learning Environment , 2001 .

[39]  Mihaela Cocea,et al.  Can Log Files Analysis Estimate Learners' Level of Motivation? , 2006, LWA.

[40]  Peng Xu,et al.  Internet Traffic Classification Using C4.5 Decision Tree: Internet Traffic Classification Using C4.5 Decision Tree , 2009 .

[41]  Lorenzo Bruzzone,et al.  Mean Map Kernel Methods for Semisupervised Cloud Classification , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[42]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[43]  R. Crowley,et al.  Mining Student Learning Data to Develop High Level Pedagogic Strategy in a Medical ITS , 2006 .

[44]  Miguel García-Remesal,et al.  A Performance Comparative Analysis Between Rule-Induction Algorithms and Clustering-Based Constructive Rule-Induction Algorithms. Application to Rheumatoid Arthritis , 2004, ISBMDA.

[45]  Ernestina Menasalvas Ruiz,et al.  Web Usage Mining Project for Improving Web-Based Learning Sites , 2005, EUROCAST.

[46]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[47]  José Salvador Sánchez,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[48]  J. Beck,et al.  An Educational Data Mining Tool to Browse Tutor-Student Interactions : Time Will Tell ! , 2005 .

[49]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[50]  Lin Sen,et al.  Internet Traffic Classification Using C4.5 Decision Tree , 2009 .

[51]  Philip S. Yu,et al.  Targeting the right students using data mining , 2000, KDD '00.

[52]  Luciano Sánchez,et al.  Boosting fuzzy rules in classification problems under single‐winner inference , 2007, Int. J. Intell. Syst..

[53]  Teck Wee Chua,et al.  Genetically Evolved Fuzzy Rule-Based Classifiers and Application to Automotive Classification , 2008, SEAL.

[54]  Wael R. Elwasif,et al.  Predicting performance from test scores using backpropagation and counterpropagation , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[55]  Tom Gedeon,et al.  Explaining student grades predicted by a neural network , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[56]  J. Rustagi Optimization Techniques in Statistics , 1994 .

[57]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[59]  Sebastián Ventura,et al.  Educational data mining: A survey from 1995 to 2005 , 2007, Expert Syst. Appl..

[60]  Pedro Antonio Gutiérrez,et al.  Evolutionary Product-Unit Neural Networks for Classification , 2006, IDEAL.

[61]  Piotr Dziwiñski,et al.  Algorithm for Generating Fuzzy Rules for WWW Document Classification , 2006, ICAISC.

[62]  David G. Stork,et al.  Pattern Classification , 1973 .

[63]  X. Yao Evolving Artificial Neural Networks , 1999 .

[64]  Ryan Shaun Joazeiro de Baker,et al.  Detecting Student Misuse of Intelligent Tutoring Systems , 2004, Intelligent Tutoring Systems.

[65]  María José del Jesús,et al.  Evolutionary algorithms for subgroup discovery in e-learning: A practical application using Moodle data , 2009, Expert Syst. Appl..

[66]  Inés Couso,et al.  Combining GP operators with SA search to evolve fuzzy rule based classifiers , 2001, Inf. Sci..

[67]  Sandip Sen,et al.  Using real-valued genetic algorithms to evolve rule sets for classification , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[68]  S. Graf,et al.  Adaptive and Intelligent Web-Based Educational Systems , 2009 .

[69]  William Rice,et al.  Moodle 1.9 E-Learning Course Development , 2008 .

[70]  Shu-Ting Wan,et al.  RBFN based on two levels iteration cluster algorithm and its application in generator fault diagnosis , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[71]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[72]  Sebastián Ventura,et al.  Using mobile and web‐based computerized tests to evaluate university students , 2009, Comput. Appl. Eng. Educ..

[73]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[74]  Terry R. Hostetler,et al.  Predicting student success in an introductory programming course , 1983, SGCS.

[75]  Yu-gang Ma,et al.  [The application of decision tree in the research of anemia among rural children under 3-year-old]. , 2009, Zhonghua yu fang yi xue za zhi [Chinese journal of preventive medicine].

[76]  Hong Yan,et al.  Fuzzy Algorithms: With Applications to Image Processing and Pattern Recognition , 1996, Advances in Fuzzy Systems - Applications and Theory.

[77]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[78]  B.D. Dan,et al.  Testing Attribute Selection Algorithms for Classification Performance on Real Data , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[79]  Francisco Herrera,et al.  Genetic fuzzy systems: taxonomy, current research trends and prospects , 2008, Evol. Intell..

[80]  Timothy Wang,et al.  Using neural networks to predict student's performance , 2002, International Conference on Computers in Education, 2002. Proceedings..

[81]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[82]  Laurie P. Dringus,et al.  Using data mining as a strategy for assessing asynchronous discussion forums , 2005, Comput. Educ..

[83]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[84]  Nada Lavrac,et al.  Classification Rule Learning with APRIORI-C , 2001, EPIA.

[85]  Alberto Guillén,et al.  Optimal Pruned K-Nearest Neighbors: OP-KNN Application to Financial Modeling , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[86]  D. E. Guyer,et al.  Identifying apple defects by utilizing spectral imaging, fluorescence and genetic neural networks. , 2000 .