Mining uncertain data

As an important data mining and knowledge discovery task, association rule mining searches for implicit, previously unknown, and potentially useful pieces of information—in the form of rules revealing associative relationships—that are embedded in the data. In general, the association rule mining process comprises two key steps. The first key step, which mines frequent patterns (i.e., frequently occurring sets of items) from data, is more computationally intensive than the second key step of using the mined frequent patterns to form association rules. In the early days, many developed algorithms mined frequent patterns from traditional transaction databases of precise data such as shopping market basket data, in which the contents of databases are known. However, we are living in an uncertain world, in which uncertain data can be found almost everywhere. Hence, in recent years, researchers have paid more attention to frequent pattern mining from probabilistic databases of uncertain data. In this paper, we review recent algorithmic development on mining uncertain data in these probabilistic databases for frequent patterns. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 316–329 DOI: 10.1002/widm.31

[1]  Nilesh N. Dalvi Uncertainty Management in Scientific Database Systems , 2009, Encyclopedia of Database Systems.

[2]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[3]  Reynold Cheng,et al.  Mining uncertain data with probabilistic guarantees , 2010, KDD.

[4]  Carson Kai-Sang Leung Convertible Constraints , 2009, Encyclopedia of Database Systems.

[5]  Carson Kai-Sang Leung,et al.  Mining uncertain data for frequent itemsets that satisfy aggregate constraints , 2010, SAC '10.

[6]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  Jeffrey Xu Yu,et al.  Guest Editors' Introduction: Special Section on Mining Large Uncertain and Probabilistic Databases , 2010, IEEE Trans. Knowl. Data Eng..

[10]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[11]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[12]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[13]  Petra Kralj Novak,et al.  Stream Mining , 2010, Encyclopedia of Machine Learning.

[14]  Charu C. Aggarwal,et al.  Frequent Pattern Mining Algorithms with Uncertain Data , 2009 .

[15]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[17]  Sunil Prabhakar,et al.  Data Uncertainty Management in Sensor Networks , 2009, Encyclopedia of Database Systems.

[18]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[19]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[20]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[21]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[22]  Ahmed Metwally Frequent Items on Streams , 2009, Encyclopedia of Database Systems.

[23]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[24]  Toon Calders,et al.  Efficient Pattern Mining of Uncertain Data with Sampling , 2010, PAKDD.

[25]  Bart Goethals,et al.  Apriori Property and Breadth-First Search Algorithms , 2009, Encyclopedia of Database Systems.

[26]  Jiawei Han,et al.  Frequent Itemsets and Association Rules , 2009, Encyclopedia of Database Systems.

[27]  Laks V. S. Lakshmanan,et al.  Pushing Convertible Constraints in Frequent Itemset Mining , 2004, Data Mining and Knowledge Discovery.

[28]  Tao Zhang,et al.  Association Rules , 2000, PAKDD.

[29]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[30]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[31]  Jian Pei,et al.  H-Mine: Fast and space-preserving frequent pattern mining in large databases , 2007 .

[32]  Philip S. Yu,et al.  Outlier Detection with Uncertain Data , 2008, SDM.

[33]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[34]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[35]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[36]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[37]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[38]  Ben Kao,et al.  A Decremental Approach for Mining Frequent Itemsets from Uncertain Data , 2008, PAKDD.

[39]  Carson Kai-Sang Leung,et al.  Efficient algorithms for the mining of constrained frequent patterns from uncertain data , 2010, SKDD.

[40]  G. Grisetti,et al.  Further Reading , 1984, IEEE Spectrum.

[41]  Biao Qin,et al.  A Bayesian classifier for uncertain data , 2010, SAC '10.

[42]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[43]  Philip S. Yu,et al.  Association Rule Mining on Streams , 2009, Encyclopedia of Database Systems.

[44]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[45]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[46]  Carson Kai-Sang Leung,et al.  Mining of Frequent Itemsets from Streams of Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[47]  Sushmita Mitra,et al.  Data Mining: Concepts and Algorithms From Multimedia to Bioinformatics , 2003 .

[48]  Carson Kai-Sang Leung Frequent Itemset Mining with Constraints , 2009, Encyclopedia of Database Systems.

[49]  Carson Kai-Sang Leung,et al.  Efficient Mining of Frequent Patterns from Uncertain Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[50]  Reynold Cheng,et al.  Naive Bayes Classification of Uncertain Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[51]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[52]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[53]  Laks V. S. Lakshmanan,et al.  Exploiting succinct constraints using FP-trees , 2002, SKDD.

[54]  Yufei Tao,et al.  Probabilistic Spatial Queries on Existentially Uncertain Data , 2005, SSTD.

[55]  Segev Wasserkrug Uncertainty in Events , 2009, Encyclopedia of Database Systems.

[56]  Graham Cormode,et al.  Approximation algorithms for clustering uncertain data , 2008, PODS.

[57]  Carson Kai-Sang Leung,et al.  Efficient algorithms for mining constrained frequent patterns from uncertain data , 2009, U '09.

[58]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[59]  Laks V. S. Lakshmanan,et al.  Efficient dynamic mining of constrained frequent sets , 2003, TODS.

[60]  Carson Kai-Sang Leung Succinct Constraints , 2009, Encyclopedia of Database Systems.

[61]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[62]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[63]  Frank Höppner,et al.  Association Rules , 2005, Data Mining and Knowledge Discovery Handbook.