Infominer: mining surprising periodic patterns

In this paper, we focus on mining surprising periodic patterns in a sequence of events. In many applications, e.g., computational biology, an infrequent pattern is still considered very significant if its actual occurrence frequency exceeds the prior expectation by a large margin. The traditional metric, such as support, is not necessarily the ideal model to measure this kind of surprising patterns because it treats all patterns equally in the sense that every occurrence carries the same weight towards the assessment of the significance of a pattern regardless of the probability of occurrence. A more suitable measurement, information, is introduced to naturally value the degree of surprise of each occurrence of a pattern as a continuous and monotonically decreasing function of its probability of occurrence. This would allow patterns with vastly different occurrence probabilities to be handled seamlessly. As the accumulated degree of surprise of all repetitions of a pattern, the concept of information gain is proposed to measure the overall degree of surprise of the pattern within a data sequence. The bounded information gain property is identified to tackle the predicament caused by the violation of the downward closure property by the information gain measure and in turn provides an efficient solution to this problem. Empirical tests demonstrate the efficiency and the usefulness of the proposed model.

[1]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[2]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[3]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[4]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[5]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[6]  Balaji Padmanabhan,et al.  A Belief-Driven Method for Discovering Unexpected Patterns , 1998, KDD.

[7]  Rajeev Motwani,et al.  Dynamic miss-counting algorithms: finding implication and similarity rules with confidence pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[9]  Yonatan Aumann,et al.  Efficient Algorithms for Discovering Frequent Sets in Incremental Databases , 1997, DMKD.

[10]  Kaizhong Zhang,et al.  Combinatorial pattern discovery for scientific data: some preliminary results , 1994, SIGMOD '94.

[11]  Ke Wang,et al.  Mining Frequent Itemsets Using Support Constraints , 2000, VLDB.

[12]  Heikki Mannila,et al.  Prediction with local patterns using cross-entropy , 1999, KDD '99.

[13]  Davood Rafiei,et al.  On similarity-based queries for time series data , 1997, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[14]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[15]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[16]  Dimitrios Gunopulos,et al.  Discovering All Most Specific Sentences by Randomized Algorithms , 1997, ICDT.

[17]  Gregory Piatetsky-Shapiro,et al.  The interestingness of deviations , 1994 .

[18]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[19]  Balaji Padmanabhan,et al.  Small is beautiful: discovering the minimal set of unexpected patterns , 2000, KDD '00.

[20]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[21]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[22]  Sigal Sahar,et al.  Interestingness via what is not interesting , 1999, KDD '99.

[23]  Sridhar Ramaswamy,et al.  On the Discovery of Interesting Patterns in Association Rules , 1998, VLDB.

[24]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[25]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[26]  Balaji Padmanabhan,et al.  Pattern Discovery in Temporal Databases: A Temporal Logic Approach , 1996, KDD.

[27]  Wynne Hsu,et al.  Multi-level organization and summarization of the discovered rules , 2000, KDD '00.

[28]  Eamonn J. Keogh,et al.  A Probabilistic Approach to Fast Pattern Matching in Time Series Databases , 1997, KDD.

[29]  Jaideep Srivastava,et al.  Pattern Directed Mining of Sequence Data , 1998, KDD.

[30]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[31]  Padhraic Smyth,et al.  Deformable Markov model templates for time-series pattern matching , 2000, KDD '00.

[32]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[33]  Sunita Sarawagi,et al.  Mining Generalized Association Rules and Sequential Patterns Using SQL Queries , 1998, KDD.

[34]  Jiawei Han,et al.  Mining Segment-Wise Periodic Patterns in Time-Related Databases , 1998, KDD.

[35]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[36]  Sunita Sarawagi,et al.  Mining Surprising Patterns Using Temporal Description Length , 1998, VLDB.

[37]  Changzhou Wang,et al.  Supporting fast search in time series for movement patterns in multiple scales , 1998, CIKM '98.

[38]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[39]  Paul R. Cohen,et al.  Efficient Mining of Statistical Dependencies , 1999, IJCAI.

[40]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[41]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.

[42]  Heikki Mannila,et al.  Global partial orders from sequential data , 2000, KDD '00.

[43]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[44]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[45]  P. Cohen,et al.  Eecient Mining of Statistical Dependencies , 1999 .

[46]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[47]  Tim Oates,et al.  Identifying distinctive subsequences in multivariate time series by clustering , 1999, KDD '99.

[48]  Myra Spiliopoulou,et al.  Managing Interesting Rules in Sequence Mining , 1999, PKDD.

[49]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[50]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[51]  X.S. Wang,et al.  Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences , 1998, IEEE Trans. Knowl. Data Eng..

[52]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[53]  Alexander Tuzhilin,et al.  Discovering Unexpected Patterns in Temporal Data Using Temporal Logic , 1997, Temporal Databases, Dagstuhl.

[54]  Richard E. Blahut,et al.  Principles and practice of information theory , 1987 .

[55]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[56]  Philip S. Yu,et al.  Mining asynchronous periodic patterns in time series data , 2000, KDD '00.

[57]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[58]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[59]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[60]  Laks V. S. Lakshmanan,et al.  Interestingness and Pruning of Mined Patterns , 1999, 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.