Correlated pattern mining in quantitative databases

We study mining correlations from quantitative databases and show that this is a more effective approach than mining associations to discover useful patterns. We propose the novel notion of quantitative correlated pattern (QCP), which is founded on two formal concepts, mutual information and all-confidence. We first devise a normalization on mutual information and apply it to the problem of QCP mining to capture the dependency between the attributes. We further adopt all-confidence as a quality measure to ensure, at a finer granularity, the dependency between the attributes with specific quantitative intervals. We also propose an effective supervised method that combines the consecutive intervals of the quantitative attributes based on mutual information, such that the interval-combining is guided by the dependency between the attributes. We develop an algorithm, QCoMine, to mine QCPs efficiently by utilizing normalized mutual information and all-confidence to perform bilevel pruning. We also identify the redundancy existing in the set of QCPs and propose effective techniques to eliminate the redundancy. Our extensive experiments on both real and synthetic datasets verify the efficiency of QCoMine and the quality of the QCPs. The experimental results also justify the effectiveness of our proposed techniques for redundancy elimination. To further demonstrate the usefulness and the quality of QCPs, we study an application of QCPs to classification. We demonstrate that the classifier built on the QCPs achieves higher classification accuracy than the state-of-the-art classifiers built on association rules.

[1]  Stefan Mutter,et al.  Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining , 2004, Australian Conference on Artificial Intelligence.

[2]  ChengJames,et al.  Correlated pattern mining in quantitative databases , 2008 .

[3]  Jiawei Han,et al.  CCMine: Efficient Mining of Confidence-Closed Correlated Patterns , 2004, PAKDD.

[4]  Wilfred Ng,et al.  Mining quantitative correlated patterns using an information-theoretic approach , 2006, KDD '06.

[5]  Yasuhiko Morimoto,et al.  Data Mining with optimized two-dimensional association rules , 2001, TODS.

[6]  Stefan Kramer,et al.  Quantitative association rules based on half-spaces: an optimization approach , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[7]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[8]  Kyuseok Shim,et al.  Mining optimized association rules with categorical and numeric attributes , 1998, Proceedings 14th International Conference on Data Engineering.

[9]  Jiawei Han,et al.  CoMine: efficient mining of correlated patterns , 2003, Third IEEE International Conference on Data Mining.

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Colin Studholme,et al.  An overlap invariant entropy measure of 3D medical image alignment , 1999, Pattern Recognit..

[12]  Balaji Padmanabhan,et al.  On the discovery of significant statistical quantitative rules , 2004, KDD.

[13]  Joydeep Ghosh,et al.  Relationship-based clustering and cluster ensembles for high-dimensional data mining , 2002 .

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Wilfred Ng,et al.  MIC Framework: An Information-Theoretic Approach to Quantitative Association Rule Mining , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[17]  Hui Xiong,et al.  TAPER: a two-step approach for all-strong-pairs correlation query in large databases , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[19]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[20]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[21]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[22]  Ke Wang,et al.  Interestingness-Based Interval Merger for Numeric Association Rules , 1998, KDD.

[23]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[24]  K. Pearson On the Theory of Contingency and Its Relation to Association and Normal Correlation , 2013 .

[25]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[26]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[27]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[28]  Zi-Yang Chen,et al.  Quantitative Association Rules Mining Methods with Privacy-preserving , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[29]  Wilfred Ng,et al.  An information-theoretic approach to quantitative association rule mining , 2008, Knowledge and Information Systems.

[30]  Hui Xiong,et al.  Hyperclique pattern discovery , 2006, Data Mining and Knowledge Discovery.

[31]  Edith Cohen,et al.  Finding Interesting Associations without Support Pruning , 2001, IEEE Trans. Knowl. Data Eng..

[32]  Joan Feigenbaum,et al.  Finding highly correlated pairs efficiently with powerful pruning , 2006, CIKM '06.

[33]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[34]  Geoffrey I. Webb Discovering associations with numeric variables , 2001, KDD '01.

[35]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[36]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[37]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[38]  Joseph L. Hellerstein,et al.  Mining mutually dependent patterns , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[39]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[40]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[41]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[42]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[43]  Lizhu Zhou,et al.  Integrating Classification and Association Rule Mining: A Concept Lattice Framework , 1999, RSFDGrC.

[44]  Kyuseok Shim,et al.  Mining optimized gain rules for numeric attributes , 1999, KDD '99.

[45]  Wilfred Ng,et al.  Correlation search in graph databases , 2007, KDD '07.

[46]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[47]  Hui Xiong,et al.  Mining strong affinity association patterns in data sets with skewed support distribution , 2003, Third IEEE International Conference on Data Mining.