A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data

In many application domains, such as education, sequences of events occurring over time need to be studied in order to understand the generative process behind these sequences, and hence classify new examples. In this paper, we propose a novel multi-granularity sequence classification framework that generates features based on frequent patterns at multiple levels of time granularity. Feature selection techniques are applied to identify the most informative features that are then used to construct the classification model. We show the applicability and suitability of the proposed framework to the area of educational data mining by experimenting on an educational dataset collected from an asynchronous communication tool in which students interact to accomplish an underlying group project. The experimental results showed that our model can achieve competitive performance in detecting the students' roles in their corresponding projects, compared to a baseline similarity-based approach.

[1]  Li Wei,et al.  Semi-supervised time series classification , 2006, KDD '06.

[2]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  Jian Pei,et al.  Sequence Data Mining (Advances in Database Systems) , 2007 .

[5]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6]  Christina S. Leslie,et al.  Fast String Kernels using Inexact Matching for Protein Sequences , 2004, J. Mach. Learn. Res..

[7]  John S. Kinnebrew,et al.  A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns , 2013, EDM 2013.

[8]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[9]  Robert D. Hannafin,et al.  Using asynchronous AV communication tools to increase academic self-efficacy , 2008, Comput. Educ..

[10]  Lars Schmidt-Thieme,et al.  Learning time-series shapelets , 2014, KDD.

[11]  George Karypis,et al.  Evaluation of Techniques for Classifying Biological Sequences , 2002, PAKDD.

[12]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  Daniel Kudenko,et al.  Feature Generation for Sequence Categorization , 1998, AAAI/IAAI.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Antonia J. Jones,et al.  Feature selection for genetic sequence classification , 1998, Bioinform..

[17]  András Kocsor,et al.  Application of a simple likelihood ratio approximant to protein sequence classification , 2006, Bioinform..

[18]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..

[19]  Àngela Nebot,et al.  Applying Data Mining Techniques to e-Learning Problems , 2007 .

[20]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[21]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[22]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[23]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[24]  Panagiotis Papapetrou,et al.  Forests of Randomized Shapelet Trees , 2015, SLDS.

[25]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[26]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[27]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[28]  Panagiotis Papapetrou,et al.  Generalized random shapelet forests , 2016, Data Mining and Knowledge Discovery.

[29]  Jian Pei,et al.  Sequence Data Mining , 2007, Advances in Database Systems.

[30]  Mohammed J. Zaki,et al.  Mining features for sequence classification , 1999, KDD '99.

[31]  Pengzhu Zhang,et al.  Sequence Matching for Suspicious Activity Detection in Anti-Money Laundering , 2008, ISI Workshops.

[32]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.