A parallel algorithm for computing borders

The border concept has been introduced by Mannila and Toivonen in their seminal paper [20]. This concept finds many applications, e.g maximal frequent itemsets, minimal functional dependencies, emerging patterns between consecutive database instances and materialized view selection. For large transactions and relational databases defined on n items or attributes, the running time of any border computations are mainly dominated by the time T (for standard sequential algorithms) required to test the interestingness, in general the frequencies, of sets of candidates. In this paper we propose a general parallel algorithm for computing borders whatever the application is. We prove the efficiency of our algorithm by showing that: (i) it generates exactly the same number of candidates as the standard sequential algorithm and, (ii) if the interestingness test time of a candidate is bounded by Δ then for a multi-processor shared memory machine with p cores, we prove that the total interestingness time Tp < T/p + 2 Δ n. We implemented our algorithm in the maximal frequent itemset (MFI) mining setting and our experiments confirm our theoretical performance guarantee.

[1]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[2]  Osmar R. Zaïane,et al.  Parallel leap: large-scale maximal pattern mining in a distributed environment , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[3]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[4]  Eric Li,et al.  Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[5]  Lotfi Lakhal,et al.  Emerging Cubes: Borders, size estimations and lossless reductions , 2009, Inf. Syst..

[6]  Georg Gottlob,et al.  New Results on Monotone Dualization and Generating Hypergraph Transversals , 2003, SIAM J. Comput..

[7]  Jean-Marc Petit,et al.  A thorough experimental study of datasets for frequent itemsets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[9]  Heikki Mannila,et al.  Standing Out in a Crowd: Selecting Attributes for Maximum Visibility , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[11]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[12]  Takeaki Uno,et al.  Enumerating Maximal Frequent Sets Using Irredundant Dualization , 2003, Discovery Science.

[13]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[14]  Cong Yu,et al.  Constructing and exploring composite items , 2010, SIGMOD Conference.

[15]  H. Mannila,et al.  Discovering all most specific sentences , 2003, TODS.

[16]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[17]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[18]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[20]  Nicolas Hanusse,et al.  A view selection algorithm with performance guarantee , 2009, EDBT '09.

[21]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[22]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[26]  Soon Myoung Chung,et al.  Efficient mining of maximal frequent itemsets from databases on a cluster of workstations , 2004, Knowledge and Information Systems.

[27]  Jian Pei,et al.  PADS: a simple yet effective pattern-aware dynamic search method for fast maximal frequent pattern mining , 2009, Knowledge and Information Systems.