Efficient mining of strongly correlated item pairs

Past attempts to mine transactional databases for strongly correlated item pairs have been beset by difficulties. In an attempt to be efficient, some algorithms produce false positive and false negative results. In an attempt to be accurate and comprehensive, other algorithms sacrifice efficiency. We propose an efficient new algorithm that uses Jaccard's correlation coefficient, which is simply the ratio between the sizes of the intersection and the union of two sets, to generate a set of strongly correlated item pairs that is both accurate and comprehensive. The pruning of candidate item pairs based on an upper bound facilitates efficiency. Furthermore, there is no possibility of false positives or false negatives. Testing of our algorithm on datasets of various sizes shows its effectiveness in real-world application.

[1]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[2]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[3]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[4]  Jiawei Han,et al.  CoMine: efficient mining of correlated patterns , 2003, Third IEEE International Conference on Data Mining.

[5]  P. Tan,et al.  Mining Hyperclique Patterns with Confidence Pruning , 2003 .

[6]  Chris Jermaine,et al.  Finding the most interesting correlations in a database: how hard can it be? , 2005, Inf. Syst..

[7]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[8]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[9]  Shinichi Morishita,et al.  Answering the Most Correlated N Association Rules Efficiently , 2002, PKDD.

[10]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[11]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[12]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Joseph L. Hellerstein,et al.  Mining mutually dependent patterns , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Jiawei Han,et al.  CCMine: Efficient Mining of Confidence-Closed Correlated Patterns , 2004, PAKDD.

[15]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[16]  Keun Ho Ryu,et al.  Mining association rules on significant rare data using relative support , 2003, J. Syst. Softw..

[17]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.