A Rough Set Model for Constraint Driven Mining of Sequential Patterns

Data mining and knowledge discovery methods host many decision support and engineering application needs of various organizations. Most real world data has time component inherent in them. Sequential patterns are inter-eve nt patterns ordered in time associated with various objects unde r study. Analysis and discovery of frequent sequential patter ns in user defined constraints are interesting data mining res ults. These patterns can serve a variety of enterprise applicat ions concerning analytic and decision support needs. Imposition of various constraints further enhances the quality of mining results and restricts the results to only relevant patterns. I n this paper, we have proposed a rough set perspective to the problem of constraint driven mining of sequential patterns. We have used indiscernibility relation from theory of rough sets to partition the search space of sequential patterns and have proposed a novel algorithm that allows pre-visualization of patterns and imposition of various types of constraints in the mining task. The algorithm C-Rough Set Partitioning is at least ten times faster than the naive algorithm SPRINT that is based on imposing regular expression constraints.

[1]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[2]  Sholom M. Weiss,et al.  Data Mining and Forecasting in Large-Scale Telecommunication Networks , 1996, IEEE Expert.

[3]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[4]  Jigyasa Bisaria,et al.  A Rough Sets Partitioning Model for Mining Sequential Patterns with Time Constraint , 2009, ArXiv.

[5]  Cheng-Jung Lin,et al.  Goal-oriented sequential pattern for network banking churn analysis , 2003, Expert Syst. Appl..

[6]  Laks V. S. Lakshmanan,et al.  Constraint-Based Multidimensional Data Mining , 1999, Computer.

[7]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[8]  Christos H. Papadimitriou,et al.  Elements of the Theory of Computation , 1997, SIGA.

[9]  Jie Chen,et al.  Mining Unexpected Temporal Associations: Applications in Detecting Adverse Drug Reactions , 2008, IEEE Transactions on Information Technology in Biomedicine.

[10]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[11]  Ming-Tat Ko,et al.  Discovering time-interval sequential patterns in sequence databases , 2003, Expert Syst. Appl..

[12]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[13]  M. Teisseire,et al.  Efficient mining of sequential patterns with time constraints: Reducing the combinations , 2009, Expert Syst. Appl..

[14]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[15]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.