Mining constrained frequent itemsets from distributed uncertain data

Nowadays, high volumes of massive data can be generated from various sources (e.g.,sensor data from environmental surveillance). Many existing distributed frequent itemset mining algorithms do not allow users to express the itemsets to be mined according to their intention via the use of constraints. Consequently, these unconstrained mining algorithms can yield numerous itemsets that are not interesting to users. Moreover, due to inherited measurement inaccuracies and/or network latencies, the data are often riddled with uncertainty. These call for both constrained mining and uncertain data mining. In this journal article, we propose a data-intensive computer system for tree-based mining of frequent itemsets that satisfy user-defined constraints from a distributed environment such as a wireless sensor network of uncertain data. We proposed a system for tree-based distributed uncertain frequent itemset mining.Our system allows users to specify constraints for expressing their interests.It finds frequent itemsets that satisfy succinct constraints from distributed uncertain data.It also handles non-succinct (e.g.,inductive succinct, anti-monotone) constraints.

[1]  Michael Georgiopoulos,et al.  APHID: An architecture for private, high-performance integrated data mining , 2010, Future Gener. Comput. Syst..

[2]  Ran Wolff,et al.  A high-performance distributed algorithm for mining association rules , 2004, Knowledge and Information Systems.

[3]  Carson Kai-Sang Leung,et al.  Mining of Frequent Itemsets from Streams of Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[4]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[5]  Alfredo Cuzzocrea,et al.  LCS-Hist: taming massive high-dimensional data cube compression , 2009, EDBT '09.

[6]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[7]  Daniela Giordano,et al.  Mining massive datasets by an unsupervised parallel clustering on a GRID: Novel algorithms and case study , 2011, Future Gener. Comput. Syst..

[8]  Laks V. S. Lakshmanan,et al.  Efficient dynamic mining of constrained frequent sets , 2003, TODS.

[9]  Alfredo Cuzzocrea,et al.  Storing and retrieving XPath fragments in structured P2P networks , 2006, Data Knowl. Eng..

[10]  Yongzhao Zhan,et al.  The retrieval of motion event by associations of temporal frequent pattern growth , 2013, Future Gener. Comput. Syst..

[11]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[12]  Carson Kai-Sang Leung,et al.  PUF-Tree: A Compact Tree Structure for Frequent Pattern Mining of Uncertain Data , 2013, PAKDD.

[13]  Laks V. S. Lakshmanan,et al.  Pushing Convertible Constraints in Frequent Itemset Mining , 2004, Data Mining and Knowledge Discovery.

[14]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[15]  Öznur Özkasap,et al.  ProFID: Practical frequent items discovery in peer-to-peer networks , 2013, Future Gener. Comput. Syst..

[16]  Carson Kai-Sang Leung,et al.  Mining Frequent Patterns from Uncertain Data with MapReduce for Big Data Analytics , 2013, DASFAA.

[17]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[18]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[19]  Alfredo Cuzzocrea,et al.  Distributed Mining of Constrained Frequent Sets from Uncertain Data , 2011, ICA3PP.

[20]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[21]  Carson Kai-Sang Leung,et al.  Mining uncertain data , 2011, WIREs Data Mining Knowl. Discov..

[22]  Carson Kai-Sang Leung Frequent Itemset Mining with Constraints , 2009, Encyclopedia of Database Systems.

[23]  Alfredo Cuzzocrea Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams , 2011, SSDBM.

[24]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[25]  Werner Dubitzky,et al.  P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining , 2010, Future Gener. Comput. Syst..

[26]  Hiroyuki Kitagawa,et al.  GPU acceleration of probabilistic frequent itemset mining from uncertain databases , 2012, CIKM.

[27]  Alfredo Cuzzocrea,et al.  Approximate OLAP Query Processing over Uncertain and Imprecise Multidimensional Data Streams , 2013, DEXA.

[28]  Laks V. S. Lakshmanan,et al.  Efficient mining of constrained correlated sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[29]  Alfredo Cuzzocrea,et al.  Frequent Itemset Mining of Distributed Uncertain Data under User-Defined Constraints , 2012, SEBD.

[30]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[31]  Osmar R. Zaïane,et al.  Parallel leap: large-scale maximal pattern mining in a distributed environment , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[32]  Alfredo Cuzzocrea,et al.  Mining Frequent Itemsets from Sparse Data Streams in Limited Memory Environments , 2013, WAIM.

[33]  Carson Kai-Sang Leung,et al.  Exploring Social Networks: A Frequent Pattern Visualization Approach , 2010, 2010 IEEE Second International Conference on Social Computing.

[34]  Smruti R. Sarangi,et al.  DUST: a generalized notion of similarity between uncertain time series , 2010, KDD.

[35]  Carson Kai-Sang Leung,et al.  Stream Mining of Frequent Patterns from Delayed Batches of Uncertain Data , 2013, DaWaK.

[36]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[37]  Alfredo Cuzzocrea,et al.  Discovering Frequent Patterns from Uncertain Data Streams with Time-Fading and Landmark Models , 2013, Trans. Large Scale Data Knowl. Centered Syst..

[38]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).