今日推荐

2009 - KDD

Probabilistic frequent itemset mining in uncertain databases

Probabilistic frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied to standard "certain" transaction databases. The consideration of existential uncertainty of item(sets), indicating the probability that an item(set) occurs in a transaction, makes traditional techniques inapplicable. In this paper, we introduce new probabilistic formulations of frequent itemsets based on possible world semantics. In this probabilistic context, an itemset X is called frequent if the probability that X occurs in at least minSup transactions is above a given threshold τ. To the best of our knowledge, this is the first approach addressing this problem under possible worlds semantics. In consideration of the probabilistic formulations, we present a framework which is able to solve the Probabilistic Frequent Itemset Mining (PFIM) problem efficiently. An extensive experimental evaluation investigates the impact of our proposed techniques and shows that our approach is orders of magnitude faster than straight-forward approaches.

2012 - ArXiv

Mining Frequent Itemsets over Uncertain Databases

In recent years, due to the wide applications of uncertain data, mining frequent itemsets over uncertain databases has attracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occurrence counting of this itemset. Thus, unlike the corresponding problem in deterministic databases where the frequent itemset has a unique definition, the frequent itemset under uncertain environments has two different definitions so far. The first definition, referred as the expected support-based frequent itemset, employs the expectation of the support of an itemset to measure whether this itemset is frequent. The second definition, referred as the probabilistic frequent itemset, uses the probability of the support of an itemset to measure its frequency. Thus, existing work on mining frequent itemsets over uncertain databases is divided into two different groups and no study is conducted to comprehensively compare the two different definitions. In addition, since no uniform experimental platform exists, current solutions for the same definition even generate inconsistent results. In this paper, we firstly aim to clarify the relationship between the two different definitions. Through extensive experiments, we verify that the two definitions have a tight connection and can be unified together when the size of data is large enough. Secondly, we provide baseline implementations of eight existing representative algorithms and test their performances with uniform measures fairly. Finally, according to the fair tests over many different benchmark data sets, we clarify several existing inconsistent conclusions and discuss some new findings.

论文关键词

genetic algorithm data mining big datum power consumption data structure association rule data stream programmable gate array field programmable gate elliptic curve data mining technique efficient algorithm smart card fpga implementation association rule mining mining algorithm power analysi frequent itemset hyperspectral datum sliding window frequent pattern leaf area apriori algorithm mining association rule leaf area index side channel uncertain datum differentially private leakage power algorithmic approach mining association elliptic curve cryptosystem mining frequent itemset mining curve cryptosystem frequent itemset mining plant leaf power analysis attack differential power analysi item set data mining task data stream mining frequent item analysis attack differential power high utility stream mining mining frequent itemset chlorophyll content maximal frequent mining frequent pattern false negative data mining problem high utility itemset frequent closed frequent itemsets mining utility itemset association mining closed itemset chlorophyll fluorescence itemsets mining transactional datum efficient mining correlation power analysi side channel analysi dpa attack maximal frequent itemset frequent closed itemset mining problem mining maximal frequent mining data stream closed frequent mining maximal itemset mining algorithm simple power analysi mining frequent closed leaf chlorophyll content memory consumption leaf chlorophyll finding frequent closed frequent itemset maximum frequent discovering frequent koblitz curve weighted frequent mining closed estimating leaf vegetative growth cryptographic circuit fast mining airborne spectrographic imager chlorophyll meter compact airborne spectrographic finding frequent itemset mining closed frequent top-k frequent estimation of leaf leakage power analysi discovering frequent itemset transactional data stream parallel frequent weighted frequent itemset prosail model discovery of association approximate frequent mining top-k frequent parallel frequent itemset itemset mining problem probabilistic frequent itemset number of transactions frequent itemsets algorithm find frequent itemset