PGLCM: efficient parallel mining of closed frequent gradual itemsets

Numerical data (e.g., DNA micro-array data, sensor data) pose a challenging problem to existing frequent pattern mining methods which hardly handle them. In this framework, gradual patterns have been recently proposed to extract covariations of attributes, such as: “When X increases, Y decreases”. There exist some algorithms for mining frequent gradual patterns, but they cannot scale to real-world databases. We present in this paper GLCM, the first algorithm for mining closed frequent gradual patterns, which proposes strong complexity guarantees: the mining time is linear with the number of closed frequent gradual itemsets. Our experimental study shows that GLCM is two orders of magnitude faster than the state of the art, with a constant low memory usage. We also present PGLCM, a parallelization of GLCM capable of exploiting multicore processors, with good scale-up properties on complex datasets. These algorithms are the first algorithms capable of mining large real world datasets to discover gradual patterns.

[1]  M. Kendall,et al.  The Problem of $m$ Rankings , 1939 .

[2]  Alexandre Termier,et al.  Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses , 2010, 2010 International Conference on High Performance Computing & Simulation.

[3]  Didier Dubois,et al.  Gradual inference rules in approximate reasoning , 1992, Inf. Sci..

[4]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[5]  Alexandre Termier,et al.  PGP-mc: Towards a Multicore Parallel Approach for Mining Gradual Patterns , 2010, DASFAA.

[6]  Eyke Hüllermeier,et al.  Association Rules for Expressing Gradual Dependencies , 2002, PKDD.

[7]  Anne Laurent,et al.  GRAANK: Exploiting Rank Correlations for Extracting Gradual Itemsets , 2009, FQAS.

[8]  Anne Laurent,et al.  Mining Closed Gradual Patterns , 2010, ICAISC.

[9]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[10]  Henri Prade,et al.  What are fuzzy rules and how to use them , 1996, Fuzzy Sets Syst..

[11]  Hiroki Arimura,et al.  An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases , 2004, Discovery Science.

[12]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[13]  Didier Dubois,et al.  A new perspective on reasoning with fuzzy rules , 2002, Int. J. Intell. Syst..

[14]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[15]  Maguelonne Teisseire,et al.  Mining Frequent Gradual Itemsets from Large Databases , 2009, IDA.

[16]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[17]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[18]  Michel Grabisch,et al.  Gradual rules and the approximation of control laws , 1995 .

[19]  Szymon Jaroszewicz,et al.  Mining rank-correlated sets of numerical attributes , 2006, KDD '06.

[20]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[21]  David Gelernter,et al.  Multiple Tuple Spaces in Linda , 1989, PARLE.

[22]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[23]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[24]  Daniel Sánchez,et al.  An Alternative Approach to Discover Gradual Dependencies , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[25]  Salvatore Orlando,et al.  Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[26]  Salvatore Orlando,et al.  DCI Closed: A Fast and Memory Efficient Algorithm to Mine Frequent Closed Itemsets , 2004, FIMI.

[27]  Takashi Washio,et al.  Mining quantitative frequent itemsets using adaptive density-based subspace clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[28]  Hiroki Arimura,et al.  An Output-Polynomial Time Algorithm for Mining Frequent Closed Attribute Trees , 2005, ILP.

[29]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.