Mining Flipping Correlations from Large Datasets with Taxonomies

In this paper we introduce a new type of pattern -- a flipping correlation pattern. The flipping patterns are obtained from contrasting the correlations between items at different levels of abstraction. They represent surprising correlations, both positive and negative, which are specific for a given abstraction level, and which "flip" from positive to negative and vice versa when items are generalized to a higher level of abstraction. We design an efficient algorithm for finding flipping correlations, the Flipper algorithm, which outperforms naive pattern mining methods by several orders of magnitude. We apply Flipper to real-life datasets and show that the discovered patterns are non-redundant, surprising and actionable. Flipper finds strong contrasting correlations in itemsets with low-to-medium support, while existing techniques cannot handle the pattern discovery in this frequency range.

[1]  Osmar R. Zaïane,et al.  Mining Positive and Negative Association Rules: An Approach for Confined Rules , 2004, PKDD.

[2]  Shamkant B. Navathe,et al.  Mining for strong negative associations in a large database of customer transactions , 1998, Proceedings 14th International Conference on Data Engineering.

[3]  Hui Xiong,et al.  Identification of Functional Modules in Protein Complexes via Hyperclique Pattern Discovery , 2004, Pacific Symposium on Biocomputing.

[4]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[5]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[6]  Xindong Wu,et al.  Efficient mining of both positive and negative association rules , 2004, TOIS.

[7]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[8]  Balaji Padmanabhan,et al.  A Belief-Driven Method for Discovering Unexpected Patterns , 1998, KDD.

[9]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[10]  Jaideep Srivastava,et al.  Indirect Association: Mining Higher Order Dependencies in Data , 2000, PKDD.

[11]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[12]  Giuseppe Psaila,et al.  Hierarchy-based mining of association rules in data warehouses , 2000, SAC '00.

[13]  Jiawei Han,et al.  Re-examination of interestingness measures in pattern mining: a unified framework , 2010, Data Mining and Knowledge Discovery.

[14]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[15]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[16]  Kurt Hornik,et al.  Implications of Probabilistic Data Modeling for Mining Association Rules , 2005, GfKl.

[17]  H. Storch,et al.  Statistical Analysis in Climate Research , 2000 .

[18]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[19]  Lucila Ohno-Machado,et al.  Analysis of matched mRNA measurements from two different microarray technologies , 2002, Bioinform..

[20]  Ramdane Maamri,et al.  Ontology-Driven Method for Ranking Unexpected Rules , 2009, CIIA.

[21]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[22]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[23]  Raj K. Singh Mining potentially interesting positive and negative association patterns: Beyond the support-confidence framework , 2009 .

[24]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.