Mining Stream Data with Data Load Shedding Techniques using Self Adaptive Sliding Window Model

Frequent patterns are patterns that appear frequently in a data set. Frequent pattern mining searches for recurring relationships in a given data set. It plays an important role in mining associations and correlation analysis among data, is an important data mining task. This work focuses on discovering frequent item sets in data-stream environments which may suffer from data overload. Stream data refer to data that flow into a system in vast volumes, change dynamically and contain multidimensional features. The traditional frequent pattern algorithms are not suitable to find frequent patterns from stream data. This paper proposed a frequent pattern mining algorithm integrate two data overload handling mechanisms. It extracts basic information from streaming data i.e. frequency of data items and keeps as base information. On user requirement the frequent pattern mining algorithm generates frequent item set from base information by using approximate inclusion-exclusion technique to calculate the approximate counts of frequent item sets. Self adaptive sliding window time model has been implemented to process the data stream. When data overload exists, the algorithm chooses data overload mechanism based on the nature of the data. The experimental results showed that the mining algorithm performed well in data overload state and generated frequent item set.

[1]  Philip S. Yu,et al.  Loadstar: A Load Shedding Scheme for Classifying Data Streams , 2005, SDM.

[2]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[3]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[4]  Noam Nisan,et al.  Approximate Inclusion-Exclusion , 1990, STOC '90.

[5]  C. L. Liu,et al.  Introduction to Combinatorial Mathematics. , 1971 .

[6]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[7]  Kuen-Fang Jea,et al.  A load-controllable mining system for frequent-pattern discovery in dynamic data streams , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[8]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[9]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[10]  Kuen-Fang Jea,et al.  Discovering frequent itemsets over transactional data streams through an efficient and stable approximate approach , 2009, Expert Syst. Appl..

[11]  Carson Kai-Sang Leung,et al.  Mining of Frequent Itemsets from Streams of Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[12]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[13]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[14]  Kuen-Fang Jea,et al.  An efficient and flexible algorithm for online mining of large itemsets , 2004, Inf. Process. Lett..

[15]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[16]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..

[17]  Won Suk Lee,et al.  A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams , 2004, J. Inf. Sci. Eng..

[18]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[19]  Kuen-Fang Jea,et al.  Mining frequent patterns from dynamic data streams with data load management , 2012, J. Syst. Softw..