Incremental update on probabilistic frequent itemsets in uncertain databases

Mining frequent itemsets in an uncertain database is a highly complicated problem. Most algorithms focus on improving the mining efficiency with the assumption that the database is static. Uncertain databases, however, are constantly updated with newly appended transactions like certain databases. Some patterns may become obsolete and new ones may emerge due to updates. Remining the whole uncertain database from scratch is very time-consuming owing to the frequentness probabilities computations. To tackle this maintenance problem, we propose an algorithm called p-FUP for efficient incremental update of patterns in an uncertain database. The p-FUP algorithm, inspired by a threshold-based PFI-testing technique and the FUP algorithm, uses approximations to incrementally update and discovers frequent itemsets in the uncertain database. Comprehensive experiments using both real and synthetic datasets show that p-FUP outperforms the re-mining based algorithm of 2.8 times faster in average, and has good linear scalability.

[1]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[3]  Dan Suciu,et al.  Towards correcting input data errors probabilistically using integrity constraints , 2006, MobiDE '06.

[4]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[5]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[6]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[7]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[8]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[9]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[10]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[11]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[12]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13]  Dan Olteanu,et al.  MayBMS: a probabilistic database management system , 2009, SIGMOD Conference.

[14]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[15]  Peiyi Tang,et al.  Mining probabilistic frequent closed itemsets in uncertain databases , 2011, ACM-SE '11.

[16]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[17]  Ben Kao,et al.  A Decremental Approach for Mining Frequent Itemsets from Uncertain Data , 2008, PAKDD.

[18]  Graham Cormode,et al.  Sketching probabilistic data streams , 2007, SIGMOD '07.

[19]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[20]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[21]  Reynold Cheng,et al.  Accelerating probabilistic frequent itemset mining: a model-based approach , 2010, CIKM.

[22]  L. L. Cam,et al.  An approximation theorem for the Poisson binomial distribution. , 1960 .

[23]  Parag Agrawal,et al.  Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS (Demo) , 2007, CIDR.

[24]  Reynold Cheng,et al.  Mining uncertain data with probabilistic guarantees , 2010, KDD.