论文信息 - Mining Frequent Itemsets in Uncertain Datasets

Mining Frequent Itemsets in Uncertain Datasets

Data in real world are usually noisy or uncertain. However, traditional data mining algorithms ignore the uncertainty in data or take it into consideration in a very limited way. In this paper, we define a relatively generic model for uncertainty in data in which each data item comes with a “tag” that defines the degree of confidence in that value. This is more realistic in many cases where the data items are derived from other evidence or more basic data. Simple examples are face recognition and fingerprint identification where, for example, the raw data itself can influence the degree of confidence in the identification. As an example problem, in this paper we study frequent itemset mining in such uncertain data. With uncertain data, finding frequent itemsets will not be perfect. There will be false positives (itemsets which are estimated to be frequent but which are not) and false negatives (frequent itemsets which are estimated not to be frequent). We consider several intuitive approaches and propose a new scheme which significantly reduces the number of false positives and false negatives.

Yi Xia | Richard Muntz | R. Muntz | Yi Xia

[1] Wenliang Du,et al. Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[2] Laks V. S. Lakshmanan,et al. ProbView: a flexible probabilistic database system , 1997, TODS.

[3] Hector Garcia-Molina,et al. The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[4] Man Hon Wong,et al. Mining fuzzy association rules in databases , 1998, SGMD.

[5] Suk Kyoon Lee,et al. An Extended Relational Database Model for Uncertain and Imprecise Information , 1992, VLDB.

[6] Alexandre V. Evfimievski,et al. Privacy preserving mining of association rules , 2002, Inf. Syst..

[7] Andrew W. Moore,et al. Probabilistic noise identification and data cleaning , 2003, Third IEEE International Conference on Data Mining.

[8] Michalis Vazirgiannis,et al. Managing Uncertainty and Quality in the Classification Process , 2002, SETN.

[9] Jayant R. Haritsa,et al. Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[10] Olga Pons,et al. GEFRED: A Generalized Model of Fuzzy Relational Databases , 1994, Inf. Sci..

[11] Ramakrishnan Srikant,et al. Privacy-preserving data mining , 2000, SIGMOD '00.

[12] Philip S. Yu,et al. Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.