Generalized association rule mining using an efficient data structure

Research highlights? We designed a data structure to generate the association rules between the items at different levels in a taxonomy tree. ? The proposed algorithms generate fewer candidate itemsets. ?The method prunes a large amount of irrelevant rules based on the minimum confidence. The goal of this paper is to use an efficient data structure to find the generalized association rules between the items at different levels in a taxonomy tree under the assumption that the original frequent itemsets and association rules were generated in advance. The primary challenge of designing an efficient mining algorithm is how to make use of the original frequent itemsets and association rules to directly generate new generalized association rules, rather than rescanning the database. In the paper, we used an efficient data structure called the frequent closed enumeration table (FCET) to store the relevant information. It stores only maximal itemsets, and can be used to derive the information of the subset itemsets in a maximal itemset through a hash function. In the proposed algorithms GMAR and GMFI, we used join methods and/or pruning techniques to generate new generalized association rules. Through several comprehensive experiments, we found that both algorithms are much better than BASIC and Cumulate algorithms also using the efficient data structure (FCET), owing to fewer candidate itemsets generated by GMAR and GMFI. Furthermore, the GMAR algorithm prunes a large amount of irrelevant rules based on the minimum confidence.

[1]  Yin-Fu Huang,et al.  A Cost-Efficient and Versatile Sanitizing Algorithm by Using a Greedy Approach , 2009, FSKD.

[2]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[3]  Yongjian Fu,et al.  Data mining , 1997 .

[4]  Wen-Yang Lin,et al.  Mining Generalized Association Rules with Multiple Minimum Supports , 2001, DaWaK.

[5]  Nikos Mamoulis,et al.  Similarity search in sets and categorical data using the signature tree , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  Chung-Leung Lui,et al.  Discovery of Generalized Association Rules with Multiple Minimum Supports , 2000, PKDD.

[7]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[8]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[9]  Ke Wang,et al.  Data mining in a large database environment , 1996, 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No.96CH35929).

[10]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[11]  Show-Jane Yen Mining Generalized Multiple-Level Association Rules , 2000, PKDD.

[12]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[13]  Kazunori Matsumoto,et al.  Data Mining of Generalized Association Rules Using a Method of Partial-Match Retrieval , 1999, Discovery Science.

[14]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[15]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[16]  Yin-Fu Huang,et al.  An Efficient Data Structure for Mining Generalized Association Rules , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[17]  Been-Chian Chien,et al.  Maintenance of generalized association rules with multiple minimum supports , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[18]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[19]  Arbee L. P. Chen,et al.  An efficient approach to discovering knowledge from large databases , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[20]  Arbee L. P. Chen,et al.  An efficient data mining technique for discovering interesting association rules , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[21]  Masaru Kitsuregawa,et al.  Parallel mining algorithms for generalized association rules with classification hierarchy , 1997, SIGMOD '98.

[22]  Shian-Shyong Tseng,et al.  Data types generalization for data mining algorithms , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[23]  Wolfgang Gaul,et al.  Mining generalized association rules for sequential and path data , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[24]  Yin-Fu Huang,et al.  Mining generalized association rules using pruning techniques , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[25]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[26]  Rüdiger Wirth,et al.  A New Algorithm for Faster Mining of Generalized Association Rules , 1998, PKDD.

[27]  Yin-Fu Huang,et al.  Privacy Preserving Association Rules by Using Greedy Approach , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[28]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[29]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[30]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[31]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD 2000.

[32]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[33]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[34]  Sunita Sarawagi,et al.  Mining Generalized Association Rules and Sequential Patterns Using SQL Queries , 1998, KDD.

[35]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..