Mining Frequent Patterns in Data Streams at Multiple Time Granularities

Although frequent-pattern mining has been widely studied and used, it is challenging to extend it to data streams. Compared to mining from a static transaction data set, the streaming case has far more information to track and far greater complexity to manage. Infrequent items can become frequent later on and hence cannot be ignored. The storage structure needs to be dynamically adjusted to reflect the evolution of itemset frequencies over time. In this paper, we propose computing and maintaining all the frequent patterns (which is usually more stable and smaller than the streaming data) and dynamically updating them with the incoming data streams. We extended the framework to mine time-sensitive patterns with approximate support guarantee. We incrementally maintain tilted-time windows for each pattern at multiple time granularities. Interesting

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[4]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[5]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[6]  Jaideep Srivastava,et al.  Data Mining for Network Intrusion Detection , 2002 .

[7]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[8]  Philip S. Yu,et al.  Online Mining of Changes from Data Streams: Research Problems and Preliminary Results , 2003 .

[9]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[10]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[11]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[12]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[13]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[14]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[15]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[16]  Leonid Khachiyan,et al.  Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[17]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[18]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[19]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.