Towards Generic Pattern Mining

Frequent Pattern Mining (FPM) is a very powerful paradigm for mining informative and useful patterns in massive, complex datasets. In this paper we propose the Data Mining Template Library, a collection of generic containers and algorithms for FPM, as well as persistency and database management classes. DMTL provides a systematic solution to a whole class of common FPM tasks like itemset, sequence, tree and graph mining. DMTL is extensible, scalable, and high-performance for rapid response on massive datasets. Our experiments show that DMTL is competitive with special purpose algorithms designed for a particular pattern type, especially as database sizes increase.

[1]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[3]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[4]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[5]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[6]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[7]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[8]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[9]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[10]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[12]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[13]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[14]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[16]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[17]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[19]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[20]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[21]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[22]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[23]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[24]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[25]  Dan A. Simovici,et al.  Galois Connections and Data Mining , 2000, J. Univers. Comput. Sci..

[26]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[27]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[28]  Gemma C. Garriga,et al.  On Horn Axiomatizations for Sequential Data , 2005, ICDT.

[29]  Yun Chi,et al.  CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees , 2004, PAKDD.

[30]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[31]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.