Pattern Based Sequence Classification

Sequence classification is an important task in data mining. We address the problem of sequence classification using rules composed of interesting patterns found in a dataset of labelled sequences and accompanying class labels. We measure the interestingness of a pattern in a given class of sequences by combining the cohesion and the support of the pattern. We use the discovered patterns to generate confident classification rules, and present two different ways of building a classifier. The first classifier is based on an improved version of the existing method of classification based on association rules, while the second ranks the rules by first measuring their value specific to the new data object. Experimental results show that our rule based classifiers outperform existing comparable classifiers in terms of accuracy and stability. Additionally, we test a number of pattern feature based models that use different kinds of patterns as features to represent each sequence as a feature vector. We then apply a variety of machine learning algorithms for sequence classification, experimentally demonstrating that the patterns we discover represent the sequences well, and prove effective for the classification task.

[1]  Xizhao Wang,et al.  Building a Rule-Based Classifier—A Fuzzy-Rough Set Approach , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[3]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[4]  Boris Cule,et al.  Itemset Based Sequence Classification , 2013, ECML/PKDD.

[5]  Hiroya Takamura,et al.  Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees , 2005, PAKDD.

[6]  Céline Robardet,et al.  A New Constraint for Mining Sets in Sequences , 2009, SDM.

[7]  Toon Calders,et al.  Mining Compressing Sequential Patterns , 2012, Stat. Anal. Data Min..

[8]  Dimitrios I. Fotiadis,et al.  Mining sequential patterns for protein fold recognition , 2008, J. Biomed. Informatics.

[9]  Dmitriy Fradkin,et al.  Under Consideration for Publication in Knowledge and Information Systems Mining Sequential Patterns for Classification , 2022 .

[10]  Xing Zhang,et al.  Building a highly-compact and accurate associative classifier , 2011, Applied Intelligence.

[11]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[12]  Tzung-Pei Hong,et al.  Classification based on association rules: A lattice-based approach , 2012, Expert Syst. Appl..

[13]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Mohammed J. Zaki,et al.  Scalable Feature Mining for Sequential Data , 2000, IEEE Intell. Syst..

[15]  Mohammed J. Zaki,et al.  Learning sequential classifiers from long and noisy discrete-event sequences efficiently , 2014, Data Mining and Knowledge Discovery.

[16]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[17]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  George C. Runger,et al.  CBC: An associative classifier with a small number of rules , 2014, Decis. Support Syst..

[20]  Vincent S. Tseng,et al.  Effective temporal data classification by integrating sequential pattern mining and probabilistic induction , 2009, Expert Syst. Appl..

[21]  Dae-Won Kim,et al.  Classification Based on Predictive Association Rules of Incomplete Data , 2012, IEICE Trans. Inf. Syst..

[22]  Dimitrios I. Fotiadis,et al.  A two-stage methodology for sequence classification based on sequential pattern mining and optimization , 2008, Data Knowl. Eng..

[23]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[24]  Jian Pei,et al.  Debt Detection in Social Security by Sequence Classification Using Both Positive and Negative Patterns , 2009, ECML/PKDD.

[25]  Christopher D. Carothers,et al.  VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining , 2010, TKDD.

[26]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[27]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[28]  Yen-Liang Chen,et al.  Using decision trees to summarize associative classification rules , 2009, Expert Syst. Appl..

[29]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[30]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[31]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[32]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[33]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[34]  Carla E. Brodley,et al.  Temporal sequence learning and data reduction for anomaly detection , 1998, CCS '98.

[35]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[36]  Sebastian Nowozin,et al.  Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  Bing Liu,et al.  Classification Using Association Rules: Weaknesses and Enhancements , 2001 .

[38]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[39]  Luc Van Gool,et al.  Efficient Mining of Frequent and Distinctive Feature Configurations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.