In recent years, the concept of and algorithm for mining probabilistic frequent itemsets (PFIs) in uncertain databases, based on possible worlds semantics and a dynamic programming approach for frequency calculations, has been proposed. The frequentness of a given itemset in this scheme can be characterized by the Poisson binomial distribution. Further and more recently, others have extended those concepts to mine for probabilistic frequent closed itemsets (PFCIs), in an attempt to reduce the number and redundancy of output. In addition, work has been done to accelerate the computation of PFIs through approximation, to mine approximate probabilistic frequent itemsets (A-PFIs), based on the fact that the Poisson distribution can closely approximate the Poisson binomial distribution---especially when the size of the database is large. In this paper, we introduce the concept of and an algorithm for mining approximate probabilistic frequent closed itemsets (A-PFCIs). A new mining algorithm for mining such concepts is introduced and called A-PFCIM. It is shown through an experimental evaluation that mining for A-PFCIs can be orders of magnitude faster than mining for traditional PFCIs.
[1]
Edward Hung,et al.
Mining Frequent Itemsets from Uncertain Data
,
2007,
PAKDD.
[2]
Tom Brijs,et al.
Profiling high frequency accident locations using associations rules
,
2002
.
[3]
Peiyi Tang,et al.
Mining probabilistic frequent closed itemsets in uncertain databases
,
2011,
ACM-SE '11.
[4]
Hans-Peter Kriegel,et al.
Probabilistic frequent itemset mining in uncertain databases
,
2009,
KDD.
[5]
Reynold Cheng,et al.
Accelerating probabilistic frequent itemset mining: a model-based approach
,
2010,
CIKM.
[6]
Hans-Peter Kriegel,et al.
Probabilistic Frequent Pattern Growth for Itemset Mining in Uncertain Databases
,
2010,
SSDBM.
[7]
Toon Calders,et al.
Approximation of Frequentness Probability of Itemsets in Uncertain Data
,
2010,
2010 IEEE International Conference on Data Mining.
[8]
Ben Kao,et al.
A Decremental Approach for Mining Frequent Itemsets from Uncertain Data
,
2008,
PAKDD.