Under Consideration for Publication in Knowledge and Information Systems Mining Sequential Patterns for Classification

While a number of efficient sequential pattern mining algorithms were developed over the years, they can still take a long time and produce a huge number of patterns, many of which are redundant. These properties are especially frustrating when the goal of pattern mining is to find patterns for use as features in classification problems. In this paper, we describe BIDE-Discriminative, a modification of BIDE that uses class information for direct mining of predictive sequential patterns. We then perform an extensive evaluation on nine real-life datasets of the different ways in which the basic BIDE-Discriminative can be used in real multi-class classification problems, including 1-versus-rest and model-based search tree approaches. The results of our experiments show that 1-versus-rest provides an efficient solution with good classification performance.

[1]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[2]  Lars Schmidt-Thieme,et al.  Motif-Based Classification of Time Series with Bayesian Networks and SVMs , 2008, GfKl.

[3]  Albrecht Zimmermann,et al.  One in a million: picking the right patterns , 2008, Knowledge and Information Systems.

[4]  Dmitriy Fradkin,et al.  Margin-closed frequent sequential pattern mining , 2010, UP '10.

[5]  Gerhard Weikum,et al.  Fast logistic regression for text categorization with variable-length n-grams , 2008, KDD.

[6]  Fabian Mörchen,et al.  Optimizing time series discretization for knowledge discovery , 2005, KDD '05.

[7]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8]  Takashi Washio,et al.  Pruning Strategies Based on the Upper Bound of Information Gain for Discriminative Subgraph Mining , 2009, PKAW.

[9]  Dimitrios Gunopulos,et al.  Discovering frequent arrangements of temporal intervals , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[12]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Paul R. Cohen,et al.  Learning and Playing in Wubble World , 2008, AIIDE.

[14]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Jian Pei,et al.  Sequence Data Mining , 2007, Advances in Database Systems.

[16]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[17]  Urpo Tuomela,et al.  Sensor signal data set for exploring context recognition of mobile devices , 2004 .

[18]  Robert Givan,et al.  Learning models and formulas of a temporal event logic , 2004 .

[19]  Shinichi Morishita,et al.  Itemset Classified Clustering , 2004, PKDD.

[20]  Arno J. Knobbe,et al.  Pattern Teams , 2006, PKDD.

[21]  Milos Hauskrecht,et al.  A Pattern Mining Approach for Classifying Multivariate Temporal Data , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[22]  Philip S. Yu,et al.  Direct mining of discriminative and essential frequent patterns via model-based search tree , 2008, KDD.

[23]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[24]  Luc De Raedt,et al.  Don't Be Afraid of Simpler Patterns , 2006, PKDD.

[25]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[26]  Shinichi Morishita,et al.  Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[27]  Milos Hauskrecht,et al.  Mining recent temporal patterns for event detection in multivariate time series data , 2012, KDD.

[28]  Hong Cheng,et al.  Mining closed discriminative dyadic sequential patterns , 2011, EDBT/ICDT '11.

[29]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[30]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.

[31]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[32]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[33]  Zhuang Wang,et al.  Log-based predictive maintenance , 2014, KDD.

[34]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[35]  瀬々 潤,et al.  Traversing Itemset Lattices with Statistical Metric Pruning (小特集 「発見科学」及び一般演題) , 2000 .

[36]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[37]  Joost N. Kok,et al.  Multi-class Correlated Pattern Mining , 2005, KDID.

[38]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[39]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[40]  Ling Huang,et al.  Mining Console Logs for Large-Scale System Problem Detection , 2008, SysML.

[41]  Yen-Liang Chen,et al.  Mining Nonambiguous Temporal Patterns for Interval-Based Events , 2007, IEEE Transactions on Knowledge and Data Engineering.

[42]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[43]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[44]  Carsten Wiuf,et al.  Bounded coordinate-descent for biological sequence classification in high dimensional predictor space , 2010, KDD.

[45]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[46]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[47]  Dino Pedreschi,et al.  Knowledge Discovery in Databases: PKDD 2004 , 2004, Lecture Notes in Computer Science.

[48]  Fabian Mörchen,et al.  Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression , 2010, Knowledge and Information Systems.

[49]  PeiJian,et al.  Mining Frequent Patterns without Candidate Generation , 2000 .

[50]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[51]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[52]  Jiawei Han,et al.  Classification of software behaviors for failure detection: a discriminative pattern mining approach , 2009, KDD.

[53]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[54]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[55]  Fabian Mörchen,et al.  Efficient mining of understandable patterns from multivariate interval time series , 2007, Data Mining and Knowledge Discovery.

[56]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[57]  Jae-Gil Lee,et al.  Mining Discriminative Patterns for Classifying Trajectories on Road Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.

[58]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..