TCOM, an innovative data structure for mining association rules among infrequent items

Association rule mining is one of the most important areas in data mining, which has received a great deal of attention. The purpose of association rule mining is the discovery of association relationships or correlations among a set of items. In this paper, we present an efficient way to find the valid association rules among the infrequent items, which is seldom mentioned and whose importance often get ignored by other researchers. We design a new data structure, called Transactional Co-Occurrence Matrix, in short TCOM, by two passing of the original transactional database. Then the occurrence count of the itemsets and valid association rules will be mined based on TCOM, which combines the advantages of both transactional oriented (horizontal) layout and item oriented (vertical) layout of the database. It turns out that any itemsets could be randomly accessed and counted without full scan of either the original database or the TCOM, which significantly improves the efficiency of the mining processes.

[1]  Laks V. S. Lakshmanan,et al.  Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[2]  Geoffrey I. Webb Efficient search for association rules , 2000, KDD '00.

[3]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[7]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[8]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[9]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[10]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[11]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[12]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[13]  Osmar R. Zaïane,et al.  Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining , 2003, KDD '03.

[14]  Kyuseok Shim,et al.  Mining optimized association rules with categorical and numeric attributes , 1998, Proceedings 14th International Conference on Data Engineering.

[15]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[16]  James Nga-Kwok Liu,et al.  Inter-transactional association rules for multi-dimensional contexts for prediction and their application to studying meteorological data , 2001, Data Knowl. Eng..

[17]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[18]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[19]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.