论文信息 - A parallel algorithm for computing borders

A parallel algorithm for computing borders

The border concept has been introduced by Mannila and Toivonen in their seminal paper [20]. This concept finds many applications, e.g maximal frequent itemsets, minimal functional dependencies, emerging patterns between consecutive database instances and materialized view selection. For large transactions and relational databases defined on n items or attributes, the running time of any border computations are mainly dominated by the time T (for standard sequential algorithms) required to test the interestingness, in general the frequencies, of sets of candidates. In this paper we propose a general parallel algorithm for computing borders whatever the application is. We prove the efficiency of our algorithm by showing that: (i) it generates exactly the same number of candidates as the standard sequential algorithm and, (ii) if the interestingness test time of a candidate is bounded by Δ then for a multi-processor shared memory machine with p cores, we prove that the total interestingness time Tp < T/p + 2 Δ n. We implemented our algorithm in the maximal frequent itemset (MFI) mining setting and our experiments confirm our theoretical performance guarantee.

Nicolas Hanusse | Sofian Maabout | S. Maabout | N. Hanusse

[1] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[2] Osmar R. Zaïane,et al. Parallel leap: large-scale maximal pattern mining in a distributed environment , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[3] Heikki Mannila,et al. Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[4] Eric Li,et al. Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[5] Lotfi Lakhal,et al. Emerging Cubes: Borders, size estimations and lossless reductions , 2009, Inf. Syst..

[6] Georg Gottlob,et al. New Results on Monotone Dualization and Generating Hypergraph Transversals , 2003, SIAM J. Comput..

[7] Jean-Marc Petit,et al. A thorough experimental study of datasets for frequent itemsets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8] Rosine Cicchetti,et al. FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[9] Heikki Mannila,et al. Standing Out in a Crowd: Selecting Attributes for Maximum Visibility , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10] Edward L. Robertson,et al. FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[11] Srinivasan Parthasarathy,et al. Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[12] Takeaki Uno,et al. Enumerating Maximal Frequent Sets Using Irredundant Dualization , 2003, Discovery Science.

[13] Mohammed J. Zaki,et al. GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[14] Cong Yu,et al. Constructing and exploring composite items , 2010, SIGMOD Conference.

[15] H. Mannila,et al. Discovering all most specific sentences , 2003, TODS.

[16] Zvi M. Kedem,et al. Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[17] Jian Pei,et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[18] Jiawei Han,et al. gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19] Johannes Gehrke,et al. MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[20] Nicolas Hanusse,et al. A view selection algorithm with performance guarantee , 2009, EDBT '09.

[21] Hannu Toivonen,et al. TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[22] Jean-Marc Petit,et al. Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[23] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24] Gösta Grahne,et al. Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25] Ramesh C Agarwal,et al. Depth first generation of long patterns , 2000, KDD '00.

[26] Soon Myoung Chung,et al. Efficient mining of maximal frequent itemsets from databases on a cluster of workstations , 2004, Knowledge and Information Systems.

[27] Jian Pei,et al. PADS: a simple yet effective pattern-aware dynamic search method for fast maximal frequent pattern mining , 2009, Knowledge and Information Systems.