Selecting the right interestingness measure for association patterns

Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.

[1]  Alan George,et al.  Computer Solution of Large Sparse Positive Definite , 1981 .

[2]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[3]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.

[5]  R. Bone Discovery , 1938, Nature.

[6]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[7]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[8]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[9]  Rajeev Motwani,et al.  Beyond Market Baskets: Generalizing Association Rules to Dependence Rules , 1998, Data Mining and Knowledge Discovery.

[10]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[11]  Fenguangzhai Song CD , 1992 .

[12]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[13]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[14]  Frederick Mosteller,et al.  Association and Estimation in Contingency Tables , 1968 .

[15]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[16]  Howard J. Hamilton,et al.  Evaluation of Interestingness Measures for Ranking Discovered Knowledge , 2001, PAKDD.

[17]  Howard J. Hamilton,et al.  Ranking the Interestingness of Summaries from Data Mining Systems , 1999, FLAIRS.