论文信息 - Frequent Itemset Counting Across Multiple Tables

Frequent Itemset Counting Across Multiple Tables

Available technology for mining data usually applies to centrally stored data (i.e., homogeneous, and in one single repository and schema). The few extensions to mining algorithms for decentralized data have largely been for load balancing. In this paper, we examine mining decentralized data for the task of finding frequent itemsets. In contrast to current techniques where data is first joined to form a single table, we exploit the inter-table foreign key relationships to obtain decentralized algorithms that execute concurrently on the separate tables, and thereafter, merge the results. In particular, for typical warehouse schema designs, our approach adapts standard algorithms, and works efficiently. We provide analyses and empirical validation for important cases to exhibit how our approach performs well. In doing so, we also compare two of our approaches in merging results from individual tables, and thereby, we exhibit certain memory vs I/O trade-offs that are inherent in merging of decentralized partial results.

Nandit Soparkar | Viviane Crestana-Jensen

[1] Nandit Soparkar,et al. Data organization and access for efficient data mining , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[4] Rajeev Motwani,et al. Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[5] David Wai-Lok Cheung,et al. Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[6] Philip S. Yu,et al. An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[7] Viviane Crestana Jensen,et al. Mining decentralized data repositories. , 2001 .

[8] Ramakrishnan Srikant,et al. Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[9] Abraham Silberschatz,et al. Database Systems Concepts , 1997 .

[10] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11] Shamkant B. Navathe,et al. An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.