Mining Frequent Itemsets from Uncertain Data

We study the problem of mining frequent itemsets from uncertain data under a probabilistic framework. We consider transactions whose items are associated with existential probabilities and give a formal definition of frequent patterns under such an uncertain data model. We show that traditional algorithms for mining frequent itemsets are either inapplicable or computationally inefficient under such a model. A data trimming framework is proposed to improve mining efficiency. Through extensive experiments, we show that the data trimming technique can achieve significant savings in both CPU cost and I/O cost.

[1]  Alain Pirotte,et al.  Imperfect Information in Relational Databases , 1996, Uncertainty Management in Information Systems.

[2]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[3]  Sara J. Graves,et al.  Using Association Rules as Texture Features , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[5]  Chengyang Zhang,et al.  Advances in Spatial and Temporal Databases , 2015, Lecture Notes in Computer Science.

[6]  Yufei Tao,et al.  Probabilistic Spatial Queries on Existentially Uncertain Data , 2005, SSTD.

[7]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.