Mining top-k strongly correlated item pairs without minimum correlation threshold

Given a user-specified minimum correlation threshold and a transaction database, the problem of mining strongly correlated item pairs is to find all item pairs with Pearson's correlation coefficients above the threshold. However, setting such a threshold is by no means an easy task. In this paper, we consider a more practical problem: mining top-k strongly correlated item pairs, where k is the desired number of item pairs that have largest correlation values. Based on the FP-tree data structure, we propose an efficient algorithm, called Tkcp, for mining such patterns without minimum correlation threshold. Our experimental results show that Tkcp algorithm outperforms the Taper algorithm, one efficient algorithm for mining correlated item pairs, even with the assumption of an optimally chosen correlation threshold. Thus, we conclude that mining top-k strongly correlated pairs without minimum correlation threshold is more preferable than the original correlation threshold based mining.

[1]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[2]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[3]  Jiawei Han,et al.  TSP: mining top-K closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[4]  Shinichi Morishita,et al.  Answering the Most Correlated N Association Rules Efficiently , 2002, PKDD.

[5]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[6]  Paul Pritchard,et al.  Finding the N largest itemsets , 1970 .

[7]  CheungYin-Ling,et al.  Mining Frequent Itemsets without Support Threshold , 2004 .

[8]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Ada Wai-Chee Fu,et al.  Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  P. Tan,et al.  Mining Hyperclique Patterns with Confidence Pruning , 2003 .

[11]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[12]  Jiawei Han,et al.  CCMine: Efficient Mining of Confidence-Closed Correlated Patterns , 2004, PAKDD.

[13]  Joseph L. Hellerstein,et al.  Mining mutually dependent patterns , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  H. T. Reynolds,et al.  The analysis of cross-classifications , 1977 .

[15]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[16]  Hui Xiong,et al.  Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs , 2004, KDD.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Jiawei Han,et al.  CoMine: efficient mining of correlated patterns , 2003, Third IEEE International Conference on Data Mining.

[19]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .