An information-theoretic approach to quantitative association rule mining

Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this paper, we propose an information-theoretic approach to avoid unrewarding combinations of both the attributes and their value intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate the strong informative relationships among the attributes, we construct a mutual information graph (MI graph), whose edges are attribute pairs that have normalized mutual information no less than a predefined information threshold. We find that the cliques in the MI graph represent a majority of the frequent itemsets. We also show that frequent itemsets that do not form a clique in the MI graph are those whose attributes are not informatively correlated to each other. By utilizing the cliques in the MI graph, we devise an efficient algorithm that significantly reduces the number of value intervals of the attribute sets to be joined during the mining process. Extensive experiments show that our algorithm speeds up the mining process by up to two orders of magnitude. Most importantly, we are able to obtain most of the high-confidence QARs, whereas the QARs that are not returned by MIC are shown to be less interesting.

[1]  Wilfred Ng,et al.  Mining quantitative correlated patterns using an information-theoretic approach , 2006, KDD '06.

[2]  Kyuseok Shim,et al.  Mining Optimized Gain Rules for Numeric Attributes , 2003, IEEE Trans. Knowl. Data Eng..

[3]  Joydeep Ghosh,et al.  Relationship-based clustering and cluster ensembles for high-dimensional data mining , 2002 .

[4]  Ke Wang,et al.  Interestingness-Based Interval Merger for Numeric Association Rules , 1998, KDD.

[5]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[6]  Colin Studholme,et al.  An overlap invariant entropy measure of 3D medical image alignment , 1999, Pattern Recognit..

[7]  Balaji Padmanabhan,et al.  On the discovery of significant statistical quantitative rules , 2004, KDD.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[10]  Wilfred Ng,et al.  MIC Framework: An Information-Theoretic Approach to Quantitative Association Rule Mining , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Zi-Yang Chen,et al.  Quantitative Association Rules Mining Methods with Privacy-preserving , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[12]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Yasuhiko Morimoto,et al.  Data Mining with optimized two-dimensional association rules , 2001, TODS.

[15]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[16]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[17]  Yonglong Luo,et al.  An Algorithm for Privacy-Preserving Quantitative Association Rules Mining , 2006, 2006 2nd IEEE International Symposium on Dependable, Autonomic and Secure Computing.

[18]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[19]  Reda Alhajj,et al.  Novel Approach to Optimize Quantitative Association Rules by Employing Multi-objective Genetic Algorithm , 2005, IEA/AIE.

[20]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[21]  C. Ordonez,et al.  Constraining and summarizing association rules in medical data , 2006 .

[22]  Peter I. Cowling,et al.  Knowledge and Information Systems , 2006 .

[23]  Kyuseok Shim,et al.  Mining optimized association rules with categorical and numeric attributes , 1998, Proceedings 14th International Conference on Data Engineering.

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[25]  Ronen Feldman,et al.  TEG—a hybrid approach to information extraction , 2005, Knowledge and Information Systems.

[26]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[27]  Stefan Kramer,et al.  Quantitative association rules based on half-spaces: an optimization approach , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[28]  AgrawalRakesh,et al.  Mining quantitative association rules in large relational tables , 1996 .

[29]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[30]  Soon Myoung Chung,et al.  Multipass Algorithms for Mining Association Rules in Text Databases , 2001, Knowledge and Information Systems.

[31]  Ansaf Salleb-Aouissi,et al.  QuantMiner: A Genetic Algorithm for Mining Quantitative Association Rules , 2007, IJCAI.

[32]  Ron Kohavi,et al.  Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, 2004 , 2004, KDD.

[33]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[34]  José C. Riquelme,et al.  An evolutionary algorithm to discover numeric association rules , 2002 .

[35]  Geoffrey I. Webb Discovering associations with numeric variables , 2001, KDD '01.

[36]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[37]  Christos Faloutsos,et al.  Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27, 2003 , 2003, KDD.

[38]  Ramakrishnan Srikant,et al.  Kdd-2001: Proceedings of the Seventh Acm Sigkdd International Conference on Knowledge Discovery and Data Mining : August 26-29, 2001 San Francisco, Ca, USA , 2002 .

[39]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[40]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .