FARP: Mining fuzzy association rules from a probabilistic quantitative database

Current studies on association rule mining focus on finding Boolean/quantitative association rules from certain databases or Boolean association rules from probabilistic databases. However, little work on mining association rules from probabilistic quantitative databases has been mentioned because the simultaneous measurement of quantitative information and probability is difficult. By introducing a novel Shannon-like Entropy, we aggregate and measure the information contained in an item with the coexistence of fuzzy uncertainty hidden in quantitative values and random uncertainty. We then propose Support and Confidence metrics for a fuzzy-probabilistic database to quantify association rules. Finally, we design an algorithm, called FARP (mining Fuzzy Association Rules from a Probabilistic quantitative data), to discover frequent fuzzy-probabilistic itemsets and fuzzy association rules using the proposed interest measures. The experimental results show the effectiveness of our method and its practicality in real-world applications.

[1]  Yun Chi,et al.  Mining association rules with non-uniform privacy concerns , 2004, DMKD '04.

[2]  Guoqing Chen,et al.  Fuzzy association rules and the extended mining algorithms , 2002, Inf. Sci..

[3]  Ping-Yu Hsu,et al.  Algorithms for mining association rules in bag databases , 2004, Inf. Sci..

[4]  Peng Shi,et al.  Learning very fast decision tree from uncertain data streams with positive and unlabeled samples , 2012, Inf. Sci..

[5]  Didier Dubois,et al.  Possibility Theory - An Approach to Computerized Processing of Uncertainty , 1988 .

[6]  Toon Calders,et al.  Efficient Pattern Mining of Uncertain Data with Sampling , 2010, PAKDD.

[7]  Carson Kai-Sang Leung,et al.  Mining uncertain data , 2011, WIREs Data Mining Knowl. Discov..

[8]  Sumit Sarkar,et al.  A probabilistic relational model and algebra , 1996, TODS.

[9]  Ben Kao,et al.  A Decremental Approach for Mining Frequent Itemsets from Uncertain Data , 2008, PAKDD.

[10]  Reynold Cheng,et al.  Efficient Mining of Frequent Item Sets on Large Uncertain Databases , 2012, IEEE Transactions on Knowledge and Data Engineering.

[11]  Hans-Peter Kriegel,et al.  Probabilistic Frequent Pattern Growth for Itemset Mining in Uncertain Databases , 2010, SSDBM.

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[13]  P. Bosc,et al.  On some fuzzy extensions of association rules , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[14]  Eyke Hüllermeier,et al.  A systematic approach to the assessment of fuzzy association rules , 2006, Data Mining and Knowledge Discovery.

[15]  Lei Chen,et al.  Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[16]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[17]  Xiang Lian,et al.  Probabilistic top-k dominating queries in uncertain databases , 2013, Inf. Sci..

[18]  Toon Calders,et al.  Approximation of Frequentness Probability of Itemsets in Uncertain Data , 2010, 2010 IEEE International Conference on Data Mining.

[19]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[20]  Graham Cormode,et al.  Sketching probabilistic data streams , 2007, SIGMOD '07.

[21]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[22]  Charu C. Aggarwal On Unifying Privacy and Uncertain Data Models , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[24]  Reynold Cheng,et al.  Accelerating probabilistic frequent itemset mining: a model-based approach , 2010, CIKM.

[25]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[26]  Parag Agrawal,et al.  Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS (Demo) , 2007, CIDR.

[27]  Reynold Cheng,et al.  Mining uncertain data with probabilistic guarantees , 2010, KDD.

[28]  Carson Kai-Sang Leung,et al.  Efficient Mining of Frequent Patterns from Uncertain Data , 2007 .

[29]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[30]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[31]  Lei Chen,et al.  Continuous monitoring of skylines over uncertain data streams , 2012, Inf. Sci..

[32]  Geert Wets,et al.  Overview of fuzzy associations mining , 2003 .

[33]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[34]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[35]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[36]  Luca Cagliero,et al.  Generalized association rule mining with constraints , 2012, Inf. Sci..

[37]  Ming-Yen Lin,et al.  Incremental update on probabilistic frequent itemsets in uncertain databases , 2012, ICUIMC.

[38]  Daniel Sánchez,et al.  A formal model for mining fuzzy rules using the RL representation theory , 2011, Inf. Sci..

[39]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[40]  Yen-Liang Chen,et al.  Mining fuzzy association rules from uncertain data , 2010, Knowledge and Information Systems.

[41]  Didier Dubois,et al.  On the representation, measurement, and discovery of fuzzy associations , 2005, IEEE Transactions on Fuzzy Systems.

[42]  Reynold Cheng,et al.  Uncertain Data Mining: A New Research Direction , 2005 .

[43]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[44]  Dan Olteanu,et al.  MayBMS: a probabilistic database management system , 2009, SIGMOD Conference.

[45]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[46]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[47]  Sanjaydeep Singh Lodhi,et al.  Performance based Frequent Itemset Mining Techniques for Data Mining , 2012 .

[48]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[49]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[50]  Jennifer Widom,et al.  Representing uncertain data: models, properties, and algorithms , 2009, The VLDB Journal.

[51]  Philip S. Yu,et al.  UFIMT: an uncertain frequent itemset mining toolbox , 2012, KDD.

[52]  Daniel Sánchez,et al.  Fuzzy association rules: general model and applications , 2003, IEEE Trans. Fuzzy Syst..

[53]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[54]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[55]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[56]  Man Hon Wong,et al.  Mining fuzzy association rules in databases , 1998, SGMD.

[57]  Attila Gyenesei,et al.  A Fuzzy Approach for Mining Quantitative Association Rules , 2000, Acta Cybern..

[58]  Etienne Kerre,et al.  Fuzzy Data Mining: Discovery of Fuzzy Generalized Association Rules+ , 2000 .

[59]  Carson Kai-Sang Leung,et al.  Fast Tree-Based Mining of Frequent Itemsets from Uncertain Data , 2012, DASFAA.

[60]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[61]  Keith C. C. Chan,et al.  An effective algorithm for discovering fuzzy rules in relational databases , 1998, 1998 IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228).

[62]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[63]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[64]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[65]  Philip S. Yu,et al.  Mining Frequent Itemsets over Uncertain Databases , 2012, Proc. VLDB Endow..

[66]  Ram Kumar,et al.  Analysis on probabilistic and binary datasets through frequent itemset mining , 2011, 2011 World Congress on Information and Communication Technologies.