Extracting Patterns from Educational Traces via Clustering and Associated Quality Metrics

Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[3]  Haiyun Bian Clustering Student Learning Activity Data , 2010, EDM.

[4]  Marian Cristian Mihaescu,et al.  TESYS: e-Learning Application Built on a Web Platform , 2006, ICE-B.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Benno Stein,et al.  On Cluster Validity and the Information Need of Users , 2003 .

[7]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[8]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[9]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[10]  Philip M. Long,et al.  Performance guarantees for hierarchical clustering , 2002, J. Comput. Syst. Sci..

[11]  Marina Meila,et al.  Comparing subspace clusterings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[13]  Kenneth R. Koedinger,et al.  A Data Repository for the EDM Community: The PSLC DataShop , 2010 .

[14]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  Cristóbal Romero,et al.  Clustering for improving educational process mining , 2014, LAK.

[17]  Cen Li,et al.  Modeling student online learning using clustering , 2006, ACM-SE 44.

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[20]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[21]  P. Ceravolo,et al.  Finding suitable activity clusters for decomposed process discovery , 2014 .

[22]  Donald A. Jackson,et al.  Similarity Coefficients: Measures of Co-Occurrence and Association or Simply Measures of Occurrence? , 1989, The American Naturalist.

[23]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[24]  Edvard Tijan,et al.  Cluster analysis of student activity in a web-based intelligent tutoring system , 2015 .

[25]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .