Mining top-k sequential patterns under leverage

This paper presents a framework for exact discovery of the top-k sequential patterns under Leverage. It combines (1) a novel definition of the expected support for a sequential pattern — a concept on which most interestingness measures directly rely — with (2) SkOPUS: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest. Our interestingness measure employs the partition approach. A pattern is interesting to the extent that it is more frequent than can be explained by assuming independence between any of the pairs of patterns from which it can be composed. The larger the support compared to the expectation under independence, the more interesting is the pattern. We build on these two elements to exactly extract the k sequential patterns with highest leverage, consistent with our definition of expected support. We conduct experiments on both synthetic data with known patterns and real-world datasets; both experiments confirm the consistency and relevance of our approach with regard to the state of the art.

[1]  Geoffrey I. Webb Self-sufficient itemsets: An approach to screening potentially interesting associations between items , 2010, TKDD.

[2]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[3]  Geoffrey I. Webb,et al.  Efficient Discovery of the Most Interesting Associations , 2013, ACM Trans. Knowl. Discov. Data.

[4]  Mikhail J. Atallah,et al.  Markov Models for Identification of Significant Episodes , 2005, SDM.

[5]  Avinash Achar,et al.  Statistical significance of episodes with general partial orders , 2015, Inf. Sci..

[6]  Geoffrey I. Webb Filtered‐top‐k association discovery , 2011, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[7]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[8]  Cécile Low-Kam,et al.  Mining Statistically Significant Sequential Patterns , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Stefan Wrobel,et al.  Efficient discovery of interesting patterns based on strong closedness , 2009, Stat. Anal. Data Min..

[10]  John F. Roddick,et al.  Sequential pattern mining -- approaches and algorithms , 2013, CSUR.

[11]  Szymon Jaroszewicz,et al.  Interestingness of frequent itemsets using Bayesian networks as background knowledge , 2004, KDD.

[12]  Raajay Viswanathan,et al.  Discovering injective episodes with general partial orders , 2011, Data Mining and Knowledge Discovery.

[13]  Florent Masseglia,et al.  The PSP Approach for Mining Sequential Patterns , 1998, PKDD.

[14]  Christophe G. Giraud-Carrier,et al.  Behavior-based clustering and analysis of interestingness measures for association rule mining , 2014, Data Mining and Knowledge Discovery.

[15]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[16]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[17]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[18]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[19]  Nikolaj Tatti,et al.  Ranking episodes using a partition model , 2015, Data Mining and Knowledge Discovery.

[20]  Wilhelmiina Hämäläinen,et al.  Efficient Discovery of the Top-K Optimal Dependency Rules with Fisher's Exact Test of Significance , 2010, 2010 IEEE International Conference on Data Mining.

[21]  Paulo J. Azevedo,et al.  Significant motifs in time series , 2012, Stat. Anal. Data Min..

[22]  Toon Calders,et al.  Mining Compressing Sequential Patterns , 2012, Stat. Anal. Data Min..

[23]  Chedy Raïssi,et al.  Mining conjunctive sequential patterns , 2008, Data Mining and Knowledge Discovery.

[24]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[25]  Howard J. Hamilton,et al.  Choosing the Right Lens: Finding What is Interesting in Data Mining , 2007, Quality Measures in Data Mining.

[26]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[27]  Nikolaj Tatti,et al.  Using background knowledge to rank itemsets , 2010, Data Mining and Knowledge Discovery.

[28]  Takeaki Uno,et al.  Frequent Pattern Mining , 2016, Encyclopedia of Algorithms.

[29]  Albrecht Zimmermann Objectively Evaluating Interestingness Measures for Frequent Itemset Mining , 2013, PAKDD Workshops.

[30]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[31]  Antonio Gomariz,et al.  TKS: Efficient Mining of Top-K Sequential Patterns , 2013, ADMA.

[32]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[33]  Nikolaj Tatti Significance of Episodes Based on Minimal Windows , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[34]  Fred S. Roberts,et al.  Applied Combinatorics , 1984 .

[35]  Geoffrey I. Webb Layered critical values: a powerful direct-adjustment approach to discovering significant patterns , 2008, Machine Learning.

[36]  Jiawei Han,et al.  TSP: Mining top-k closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[37]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[38]  Geoffrey I. Webb Efficient search for association rules , 2000, KDD '00.

[39]  Nikolaj Tatti,et al.  Discovering episodes with compact minimal windows , 2014, Data Mining and Knowledge Discovery.

[40]  Mikhail J. Atallah,et al.  Reliable detection of episodes in event sequences , 2004, Knowledge and Information Systems.

[41]  Fabio Crestani,et al.  Ranking Sequential Patterns with Respect to Significance , 2010, PAKDD.

[42]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[43]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[44]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[45]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[46]  Jilles Vreeken,et al.  The long and the short of it: summarising event sequences with serial episodes , 2012, KDD.

[47]  Jilles Vreeken,et al.  Summarizing data succinctly with the most informative itemsets , 2012, TKDD.