Behavior-based clustering and analysis of interestingness measures for association rule mining

A number of studies, theoretical, empirical, or both, have been conducted to provide insight into the properties and behavior of interestingness measures for association rule mining. While each has value in its own right, most are either limited in scope or, more importantly, ignore the purpose for which interestingness measures are intended, namely the ultimate ranking of discovered association rules. This paper, therefore, focuses on an analysis of the rule-ranking behavior of 61 well-known interestingness measures tested on the rules generated from 110 different datasets. By clustering based on ranking behavior, we highlight, and formally prove, previously unreported equivalences among interestingness measures. We also show that there appear to be distinct clusters of interestingness measures, but that there remain differences among clusters, confirming that domain knowledge is essential to the selection of an appropriate interestingness measure for a particular task and business objective.

[1]  Pang-Ning Tan,et al.  Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[2]  Sanjay Chawla,et al.  Using Significant, Positively Associated and Relatively Class Correlated Rules for Associative Classification of Imbalanced Datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[3]  Alexandre Villeminot,et al.  Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set , 2007, Comput. Stat. Data Anal..

[4]  E. S. Pearson,et al.  TESTS FOR RANK CORRELATION COEFFICIENTS. I , 1957 .

[5]  Nelson M. Blachman,et al.  The amount of information that y gives about X , 1968, IEEE Trans. Inf. Theory.

[6]  N. C. Silver,et al.  Averaging Correlation Coefficients: Should Fishers z Transformation Be Used? , 1987 .

[7]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[8]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[9]  Constantine D. Spyropoulos,et al.  Machine Learning and Its Applications , 2001, Lecture Notes in Computer Science.

[10]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[11]  N. Nilsson,et al.  Readings in Artificial Intelligence , 1981 .

[12]  G. Yule On the Methods of Measuring Association between Two Attributes , 1912 .

[13]  Beverly Sackler,et al.  What Is Interesting: Studies on Interestingness in Knowledge Discovery , 2003 .

[14]  S. Kannan,et al.  Association Rule Pruning based on Interestingness Measures with Clustering , 2009, ArXiv.

[15]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[16]  Davy Janssens,et al.  Dilated Chi-Square: A Novel Interestingness Measure to Build Accurate and Compact Decision List , 2004, Intelligent Information Processing.

[17]  Geoffrey I. Webb Filtered‐top‐k association discovery , 2011, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[18]  Sanjay Chawla,et al.  CCCS: a top-down associative classifier for imbalanced class distribution , 2006, KDD '06.

[19]  H. O. Hartley,et al.  TESTS FOR RANK CORRELATION COEFFICIENTS. I , 1957 .

[20]  Yun Sing Koh,et al.  Rare Association Rule Mining via Transaction Clustering , 2008, AusDM.

[21]  Christophe G. Giraud-Carrier,et al.  A metric for unsupervised metalearning , 2011, Intell. Data Anal..

[22]  Régis Gras,et al.  Implication Intensity: From the Basic Statistical Definition to the Entropic Version , 2003 .

[23]  Sigal Sahar,et al.  Exploring interestingness through clustering: a framework , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[24]  Daniel Sánchez,et al.  Measuring the accuracy and interest of association rules: A new framework , 2002, Intell. Data Anal..

[25]  B. Padmanabhan The Interestingness Paradox in Pattern Discovery , 2004 .

[26]  Tijl De Bie,et al.  Interesting Multi-relational Patterns , 2011, 2011 IEEE 11th International Conference on Data Mining.

[27]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[28]  P. Krishna Reddy,et al.  An improved multiple minimum support based approach to mine rare association rules , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[29]  Nikolaj Tatti,et al.  Using background knowledge to rank itemsets , 2010, Data Mining and Knowledge Discovery.

[30]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[31]  Régis Gras,et al.  Mesurer la qualité des règles et de leurs contraposées avec le taux informationnel TIC , 2004, EGC.

[32]  Régis Gras,et al.  Assessing rule interestingness with a probabilistic measure of deviation from equilibrium , 2005 .

[33]  Michael J. Burke,et al.  Averaging Correlations: Expected Values and Bias in Combined Pearson rs and Fisher's z Transformations , 1998 .

[34]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[35]  Marina Meila,et al.  Local equivalences of distances between clusterings—a geometric perspective , 2012, Machine Learning.

[36]  John Gaschnig,et al.  MODEL DESIGN IN THE PROSPECTOR CONSULTANT SYSTEM FOR MINERAL EXPLORATION , 1981 .

[37]  Osmar R. Zaïane,et al.  A study on interestingness measures for associative classifiers , 2010, SAC '10.

[38]  Kurt Hornik,et al.  New probabilistic interest measures for association rules , 2007, Intell. Data Anal..

[39]  Hiep Xuan Huynh,et al.  A Data Analysis Approach for Evaluating the Behavior of Interestingness Measures , 2005, Discovery Science.

[40]  Hiep Xuan Huynh,et al.  A Graph-based Clustering Approach to Evaluate Interestingness Measures: A Tool and a Comparative Study , 2007, Quality Measures in Data Mining.

[41]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[42]  Patrice Bertrand,et al.  Loevinger's measures of rule quality for assessing cluster stability , 2006, Comput. Stat. Data Anal..

[43]  Patrick Meyer,et al.  On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid , 2008, Eur. J. Oper. Res..

[44]  Takahira Yamaguchi,et al.  Evaluation of Rule Interestingness Measures with a Clinical Dataset on Hepatitis , 2004, PKDD.

[45]  Sigal Sahar,et al.  Interestingness via what is not interesting , 1999, KDD '99.

[46]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[47]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[48]  J. Loevinger A systematic approach to the construction and evaluation of tests of ability. , 1947 .

[49]  Régis Gras,et al.  Using information-theoretic measures to assess association rule interestingness , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[50]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[51]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[52]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[53]  Nello Cristianini,et al.  MINI: Mining Informative Non-redundant Itemsets , 2007, PKDD.

[54]  Tijl De Bie,et al.  A framework for mining interesting pattern sets , 2010, UP '10.

[55]  Régis Gras,et al.  L'implication statistique, une nouvelle méthode d'analyse de données , 1991 .

[56]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[57]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[58]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[59]  Maria E. Orlowska,et al.  CCAIIA: Clustering Categorial Attributed into Interseting Accociation Rules , 1998, PAKDD.

[60]  Hugo Liu,et al.  Searching Multiple Databases for Interesting Complexes , 1997 .

[61]  Jiuyong Li,et al.  On optimal rule discovery , 2006, IEEE Transactions on Knowledge and Data Engineering.

[62]  Naim Dahnoun,et al.  Studies in Computational Intelligence , 2013 .

[63]  Philippe Lenca,et al.  A Clustering of Interestingness Measures , 2004, Discovery Science.

[64]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[65]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[66]  Davy Janssens,et al.  Improving Associative Classification by Incorporating Novel Interestingness Measures , 2005, ICEBE.

[67]  Patrick Meyer,et al.  Association Rule Interestingness Measures: Experimental and Theoretical Studies , 2007, Quality Measures in Data Mining.

[68]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[69]  Hamparsum Bozdogan,et al.  Statistical Data Mining and Knowledge Discovery , 2004 .

[70]  Szymon Jaroszewicz,et al.  Interestingness of frequent itemsets using Bayesian networks as background knowledge , 2004, KDD.

[71]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[72]  Djamel A. Zighed,et al.  Implication Strength of Classification Rules , 2006, ISMIS.

[73]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[74]  Serge Winitzki,et al.  Uniform Approximations for Transcendental Functions , 2003, ICCSA.

[75]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[76]  Geoffrey I. Webb Discovering significant rules , 2006, KDD '06.

[77]  Takahira Yamaguchi,et al.  Investigation of Rule Interestingness in Medical Data Mining , 2003, Active Mining.

[78]  Hiroshi Motoda,et al.  Data Processing and Knowledge Discovery in Databases , 1998 .

[79]  Shusaku Tsumoto,et al.  Analyzing Behavior of Objective Rule Evaluation Indices Based on a Correlation Coefficient , 2008, KES.

[80]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[81]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[82]  Geoffrey I. Webb Self-sufficient itemsets: An approach to screening potentially interesting associations between items , 2010, TKDD.

[83]  Yiyu Yao,et al.  An Analysis of Quantitative Measures Associated with Rules , 1999, PAKDD.

[84]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[85]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[86]  Yves Kodratoff,et al.  Comparing Machine Learning and Knowledge Discovery in DataBases: An Application to Knowledge Discovery in Texts , 2001, Machine Learning and Its Applications.

[87]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[88]  M. Ohsaki A Rule Discovery Support System for Sequential Medical Data,-In the Case Study of a Chronic Hepatitis Dataset- , 2002 .

[89]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[90]  Jilles Vreeken,et al.  Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.

[91]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[92]  Frank Höppner,et al.  Association Rules , 2005, Data Mining and Knowledge Discovery Handbook.

[93]  Jiawei Han,et al.  Re-examination of interestingness measures in pattern mining: a unified framework , 2010, Data Mining and Knowledge Discovery.

[94]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[95]  Olivier Teytaud,et al.  Association Rule Interestingness: Measure and Statistical Validation , 2007, Quality Measures in Data Mining.

[96]  Edward D Rothman,et al.  Statistics, methods and applications , 1987 .

[97]  Padhraic Smyth,et al.  An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[98]  Régis Gras,et al.  Élaboration et évaluation d'un indice d'implication pour des données binaires. I , 1981 .

[99]  G. Yule On the Methods of Measuring Association between Two Attributes , 1912 .

[100]  Hiep Xuan Huynh,et al.  Discovering the Stable Clusters between Interestingness Measures , 2006, ICEIS.

[101]  Yves Kodratoff,et al.  Evaluation de la résistance au bruit de quelques mesures d'extraction de règles d'association , 2002, EGC.

[102]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.

[103]  Kenneth McGarry,et al.  A survey of interestingness measures for knowledge discovery , 2005, The Knowledge Engineering Review.

[104]  Sigal Sahar,et al.  Interestingness Measures - On Determining What Is Interesting , 2005, Data Mining and Knowledge Discovery Handbook.

[105]  Frederick Mosteller,et al.  Association and Estimation in Contingency Tables , 1968 .

[106]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .