MAFIA: a maximal frequent itemset algorithm

We present a new algorithm for mining maximal frequent itemsets from a transactional database. The search strategy of the algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms that significantly improve mining performance. Our implementation for support counting combines a vertical bitmap representation of the data with an efficient bitmap compression scheme. In a thorough experimental analysis, we isolate the effects of individual components of MAFIA including search space pruning techniques and adaptive compression. We also compare our performance with previous work by running tests on very different types of data sets. Our experiments show that MAFIA performs best when mining long itemsets and outperforms other algorithms on dense data by a factor of three to 30.

[1]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[5]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[6]  Heikki Mannila,et al.  A Perspective on Databases and Data Mining , 1995, KDD.

[7]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[8]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[9]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[10]  Dimitrios Gunopulos,et al.  Discovering All Most Specific Sentences by Randomized Algorithms , 1997, ICDT.

[11]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[12]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[13]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[14]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[15]  Philip S. Yu,et al.  Online generation of association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[16]  Philip S. Yu,et al.  Mining Large Itemsets for Association Rules , 1998, IEEE Data Eng. Bull..

[17]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[18]  Kyuseok Shim,et al.  Mining optimized association rules with categorical and numeric attributes , 1998, Proceedings 14th International Conference on Data Engineering.

[19]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[20]  Jun-Lin Lin,et al.  Mining association rules: anti-skew algorithms , 1998, Proceedings 14th International Conference on Data Engineering.

[21]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[22]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[23]  David Wai-Lok Cheung,et al.  LGen - A Lattice-Based Candidate Set Generation Algorithm for I/O Efficient Association Rule Mining , 1999, PAKDD.

[24]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[25]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[26]  Nandit Soparkar,et al.  Data organization and access for efficient data mining , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[27]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[28]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[29]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[30]  Ke Wang,et al.  Mining Frequent Itemsets Using Support Constraints , 2000, VLDB.

[31]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[32]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[33]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[34]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[35]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[36]  Johannes Gehrke,et al.  DEMON: Mining and Monitoring Evolving Data , 2001, IEEE Trans. Knowl. Data Eng..

[37]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[38]  Claudia Tarantola,et al.  Web Mining pattern discovery , 2003 .

[39]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Association Rule Mining , 2007 .