Principal Component Analysis and Self-Organizing Map Clustering for Student Browsing Behaviour Analysis

Abstract Using e-learning as the learning platform provides flexibility to student to learn anywhere without limitation of time and space. Moodle, currently become one of the leading learning management system that deliver e-learning easily by providing customized tool for educators to deploy learning materials in various forms. The students behavior is important to be identified in order for educators to improve the teaching approach and to enhance the students performances. However, the flexibility in the learning environment cause student behaviour more challenging to be identified. In Moodle, the student’s interactions and activities while learning online are captured in the log files. The data stored in the log files contain meaningful information such as the student behavior, their preferences and their knowledge level. In this study, the raw dataset captured in the log file undergo pre-processing phase such as data cleaning, transformation and selection to prepare the dataset for the analysis purposes. Principal Component Analysis (PCA) that is known for improvement of accuracy for unsupervised learning technique is used to identify the most significant features of students attributes from the log file. Course view, notes, exercises, examples and assignments are the features selected using PCA to be feed into Self-Organizing Map (SOM) in order to cluster students based on the behavior. The result shows that the technique is able to cluster the student browsing behavior using the total number of browsing frequency in 14 weeks study session, which form two cluster groups; high browsing behavior and low browsing behavior. The quality of SOM clustering technique is validated through the internal clustering evaluation and the results show that SOM produces better quality of cluster groups compared to k-mean and Partition Around Medoids (PAM).

[1]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[2]  Elvira Popescu Diagnosing Students' Learning Style in an Educational Hypermedia System , 2009 .

[3]  Mohamed S. Kamel,et al.  Efficient greedy feature selection for unsupervised learning , 2012, Knowledge and Information Systems.

[4]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[5]  Ian T. Jolliffe,et al.  PRINCIPAL COMPONENT ANALYSIS: A BEGINNER'S GUIDE — I. Introduction and application , 1990 .

[6]  Manojit Chattopadhyay,et al.  Application of visual clustering properties of self organizing map in machine-part cell formation , 2012, Appl. Soft Comput..

[7]  David Fournier,et al.  Principal components analysis: theory and application to gene expression data analysis , 2018 .

[8]  Ramón Zataraín-Cabada,et al.  EDUCA: A web 2.0 authoring tool for developing adaptive and intelligent tutoring systems using a Kohonen network , 2011, Expert Syst. Appl..

[9]  Sebastián Ventura,et al.  Data mining in course management systems: Moodle case study and tutorial , 2008, Comput. Educ..

[10]  Christina M. Steiner,et al.  GRAPPLE : learning management systems meet adaptive learning environments , 2013 .

[11]  A. Santhakumaran,et al.  Statistical Normalization and Back Propagationfor Classification , 2011 .

[12]  Larry Owens,et al.  Punish Them or Engage Them? Teachers' Views of Unproductive Student Behaviours in the Classroom. , 2014 .

[13]  Abeer M. Mahmoud,et al.  Clustering of Slow Learners Behavior for Discovery of Optimal Patterns of Learning , 2014 .

[14]  Ramachandran Baskaran,et al.  A Survey on Internal Validity Measure for Cluster Validation , 2010 .

[15]  Eréndira Rendón,et al.  Internal versus External cluster validation indexes , 2011 .