Mining Frequent Itemsets from Sparse Data Streams in Limited Memory Environments

Floods of data can be produced in many applications such as Web click streams or wireless sensor networks. Hence, algorithms for mining frequent itemsets from data streams are in demand. Many existing stream mining algorithms capture important streaming data and assume that the captured data can fit into main memory. However, problem arose when the available memory so limited that such an assumption does not hold. In this paper, we present a data structure called DSTable to capture important data from the streams onto the disk. The DSTable can be easily maintained and is applicable for mining frequent itemsets from streams (especially sparse data) in limited memory environments.

[1]  Carson Kai-Sang Leung,et al.  Mining Frequent Patterns from Uncertain Data with MapReduce for Big Data Analytics , 2013, DASFAA.

[2]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[3]  Carson Kai-Sang Leung,et al.  Finding Diverse Friends in Social Networks , 2013, APWeb.

[4]  Alfredo Cuzzocrea,et al.  Stream mining of frequent sets with limited memory , 2013, SAC '13.

[5]  Alfredo Cuzzocrea,et al.  Discovering Frequent Patterns from Uncertain Data Streams with Time-Fading and Landmark Models , 2013, Trans. Large Scale Data Knowl. Centered Syst..

[6]  Gösta Grahne,et al.  Mining frequent itemsets from secondary memory , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[7]  Longbing Cao,et al.  Mining Frequent Patterns from Human Interactions in Meetings Using Directed Acyclic Graphs , 2013, PAKDD.

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Carson Kai-Sang Leung,et al.  Efficient Mining of Frequent Itemsets from Data Streams , 2008, BNCOD.

[10]  Hui Xiong,et al.  Mining globally distributed frequent subgraphs in a single labeled graph , 2009, Data Knowl. Eng..

[11]  Mengchi Liu,et al.  A High-Performance Algorithm for Frequent Itemset Mining , 2012, WAIM.

[12]  Arbee L. P. Chen,et al.  Efficient frequent sequence mining by a dynamic strategy switching algorithm , 2008, The VLDB Journal.

[13]  Keith G. Jeffery,et al.  Sharing Data, Information and Knowledge, 25th British National Conference on Databases, BNCOD 25, Cardiff, UK, July 7-10, 2008. Proceedings , 2008, BNCOD.

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  Kyuseok Shim,et al.  Web Technologies and Applications , 2014, Lecture Notes in Computer Science.

[16]  Odysseas Papapetrou,et al.  Sketch-based Querying of Distributed Sliding-Window Data Streams , 2012, Proc. VLDB Endow..

[17]  David P. Woodruff,et al.  A General Method for Estimating Correlated Aggregates over a Data Stream , 2012, ICDE.

[18]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[19]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[20]  Carson Kai-Sang Leung,et al.  PUF-Tree: A Compact Tree Structure for Frequent Pattern Mining of Uncertain Data , 2013, PAKDD.

[21]  Jianyong Wang,et al.  Efficient Mining of Closed Sequential Patterns on Stream Sliding Window , 2011, 2011 IEEE 11th International Conference on Data Mining.

[22]  Srinivasan Parthasarathy,et al.  Out-of-core frequent pattern mining on a commodity PC , 2006, KDD '06.

[23]  Keyan Cao,et al.  A Framework for High-Quality Clustering Uncertain Data Stream over Sliding Windows , 2012, WAIM.

[24]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).