Efficient Sentinel Mining Using Bitmaps on Modern Processors

This paper proposes a highly efficient bitmap-based approach for discovery of so-called sentinels. Sentinels represent schema level relationships between changes over time in certain measures in a multidimensional data cube. Sentinels are actionable and notify users based on previous observations, for example, that revenue might drop within two months if an increase in customer problems combined with a decrease in website traffic is observed. We significantly extend prior art by representing the sentinel mining problem by bitmap operations, using bitmapped encoding of so-called indication streams. We present a very efficient algorithm, SentBit, that is 2-3 orders of magnitude faster than the state of the art, and utilizes CPU specific instructions and the multicore architectures available on modern processors. The SentBit algorithm scales efficiently to very large data sets, which is verified by extensive experiments on both real and synthetic data.

[1]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[2]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[3]  Torben Bach Pedersen,et al.  Efficient Discovery of Generalized Sentinel Rules , 2010, DEXA.

[4]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[5]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Leonid Khachiyan,et al.  Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[9]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[12]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Torben Bach Pedersen,et al.  Using sentinel technology in the TARGIT BI suite , 2010, Proc. VLDB Endow..

[16]  Tomonobu Ozaki,et al.  Discovery of Quantitative Sequential Patterns from Event Sequences , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[17]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[18]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[19]  Patrick Bosc,et al.  On Data Summaries Based on Gradual Rules , 1999, Fuzzy Days.