TCS Technical Report ZBDD-growth: An Efficient Method for Frequent Pattern Mining and Knowledge Indexing

(Abstract) Frequent pattern mining is one of the fundamental techniques for knowledge discovery and data mining. In the last decade, a number of efficient algorithms for frequent pattern mining have been presented, but most of them focused on just enumerating the patterns which satisfy the given conditions, and it was a different matter how to store and index the result of patterns for efficient data analysis. In this paper, we propose a fast algorithm of extracting all/maximal frequent patterns from transaction databases and simultaneously indexing the result of huge patterns using Zero-suppressed BDDs (ZBDDs). Our method, ZBDD-growth, is fast as competitive to the existing state-of-the-art algorithms, and not only enumerat-ing/listing the patterns but also indexing the output data compactly on the memory. After mining, the result of patterns can efficiently be analyzed by using algebraic operations. The data structures of BDDs have already been used in VLSI logic design systems successively, but our method will be the first practical work of applying the BDD-based techniques for data mining area.

[1]  Shin-ichi Minato Efficient combinatorial item set analysis based on zero-suppressed BDDs , 2005 .

[2]  Olivier Coudert,et al.  A New Viewpoint on Two-Level Logic Minimization , 1993, 30th ACM/IEEE Design Automation Conference.

[3]  S. Minato Binary Decision Diagrams and Applications for VLSI CAD , 1995 .

[4]  Hiroshi G. Okuno,et al.  On the Properties of Combination Set Operations , 1998, Inf. Process. Lett..

[5]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[6]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Shin-ichi Minato,et al.  Finding Simple Disjoint Decompositions in Frequent Itemset Data Using Zero-suppressed BDDs , 2005 .

[8]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[9]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Shin-ichi Minato,et al.  Zero-Suppressed BDDs for Set Manipulation in Combinatorial Problems , 1993, 30th ACM/IEEE Design Automation Conference.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Bart Goethals,et al.  Survey on Frequent Pattern Mining , 2003 .

[13]  Heikki Mannila,et al.  Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[14]  Shin-ichi Minato,et al.  Zero-suppressed BDDs and their applications , 2001, International Journal on Software Tools for Technology Transfer.

[15]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..