Frequent closed itemset based algorithms: a thorough structural and analytical survey

As a side effect of the digitalization of unprecedented amount of data, traditional retrieval tools proved to be unable to extract hidden and valuable knowledge. Data Mining, with a clear promise to provide adequate tools and/or techniques to do so, is the discovery of hidden information that can be retrieved from datasets. In this paper, we present a structural and analytical survey of <u>f</u>requent <u>c</u>losed <u>i</u>temset (FCI) based algorithms for mining association rules. Indeed, we provide a structural classification, in four categories, and a comparison of these algorithms based on criteria that we introduce. We also present an analytical comparison of FCI-based algorithms using benchmark dense and sparse datasets as well as "worst case" datasets. Aiming to stand beyond classical performance analysis, we intend to provide a focal point on performance analysis based on memory consumption and advantages and/or limitations of optimization strategies, used in the FCI-based algorithms.

[1]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[2]  Gerd Stumme,et al.  Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets , 2000, Computational Logic.

[3]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[4]  Rokia Missaoui,et al.  A Fast Algorithm for Building the Hasse Diagram of a Galois Lattice , 2000 .

[5]  Gerd Stumme,et al.  Computing iceberg concept lattices with T , 2002, Data Knowl. Eng..

[6]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[7]  Salvatore Orlando,et al.  DCI Closed: A Fast and Memory Efficient Algorithm to Mine Frequent Closed Itemsets , 2004, FIMI.

[8]  Sadok Ben Yahia,et al.  A Divide and Conquer Approach for Deriving Partially Ordered Sub-structures , 2005, PAKDD.

[9]  Fabrizio Silvestri,et al.  kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets , 2003, FIMI.

[10]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Engelbert Mephu Nguifo,et al.  Étude et conception d'algorithmes de génération de concepts formels , 2004, Ingénierie des Systèmes d Inf..

[12]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[13]  Osmar R. Zaïane,et al.  Finding All Frequent Patterns Starting from the Closure , 2005, ADMA.

[14]  Charu C. Aggarwal,et al.  Towards long pattern generation in dense databases , 2001, SKDD.

[15]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[16]  Gösta Grahne,et al.  Efficiently mining frequent itemsets from very large databases , 2004 .

[17]  E. Mephu-Nguifo Galois Lattice: a framework for concept learning. Design, evaluation and refinement , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[18]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[19]  Loïck Lhote,et al.  Average number of frequent (closed) patterns in Bernoulli and Markovian databases , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[20]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[21]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[22]  L. Beran,et al.  [Formal concept analysis]. , 1996, Casopis lekaru ceskych.

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  Bart Goethals,et al.  FIMI'03: Workshop on Frequent Itemset Mining Implementations , 2003 .

[25]  Marzena Kryszkiewicz Concise representation of frequent patterns based on disjunction-free generators , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[26]  Hiroki Arimura,et al.  An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases , 2004, Discovery Science.

[27]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[28]  J. Galambos,et al.  Bonferroni-type inequalities with applications , 1996 .

[29]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules and sequential patterns , 1996 .

[30]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[31]  Jean-François Boulicaut,et al.  A Survey on Condensed Representations for Frequent Sets , 2004, Constraint-Based Mining and Inductive Databases.

[32]  Engelbert Mephu Nguifo,et al.  Partitioning large data to scale up lattice-based algorithm , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[33]  Nicolas Pasquier,et al.  Data Mining : algorithmes d'extraction et de réduction des règles d'association dans les bases de données , 2000 .

[34]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[35]  Jong-Seok. Kim Mining association rules using formal concept analysis. , 2002 .

[36]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[37]  Sergei O. Kuznetsov,et al.  Comparing performance of algorithms for generating concept lattices , 2002, J. Exp. Theor. Artif. Intell..

[38]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[39]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[40]  李幼升,et al.  Ph , 1989 .

[41]  Sadok Ben Yahia,et al.  Avoiding the itemset closure computation "pitfall" , 2005, CLA.

[42]  Lotfi Lakhal,et al.  Essential Patterns: A Perfect Cover of Frequent Patterns , 2005, DaWaK.

[43]  Paul Embrechts,et al.  Bonferroni-Type Inequality With Applications. , 1997 .

[44]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[45]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[46]  Jean-Marc Petit,et al.  A thorough experimental study of datasets for frequent itemsets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[47]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.