Hierarchical Clustering for Thematic Browsing and Summarization of Large Sets of Association Rules

Abstract In this paper we propose a method for grouping and summa-rizing large sets of association rules according to the itemscontained in each rule. We use hierarchical clustering to par-tition the initial rule set into thematically coherent subsets.This enables the summarization of the rule set by adequatelychoosing a representative rule for each subset, and helps inthe interactive exploration of the rule model by the user. Wedefine the requirements of our approach, and formally showthe adequacy of the chosen approach to our aims. Rule clus-ters can also be used to infer novel interest measures forthe rules. Such measures are based on the lexicon of therules and are complementary to measures based on statisti-cal properties, such as confidence, lift and conviction. Weshow examples of the application of the proposed techniques. 1 Introduction.Despite being popular as a technique for market basketanalysis, association rules [1][26] are now used in manydifferent applications, from modeling web user prefer-ences [9], to studying census data [6]. The apriori al-gorithm [2], and variants [6][20][23], among others, arethe standard technique for association rule discovery.The mining process, however, is not finished when therules are produced. A set of association rules is mostlya descriptive model that typically requires post process-ing before actionable information (information that canbe acted upon in order to produce value [5]) is found.Moreover, due to the completeness of the rule discoveryalgorithm, the set of rules generated for a single prob-lem can be very large, easily reaching hundreds or eventhousands of rules [13].Post processing techniques mainly encompass rulefiltering (or pruning), using statistical measures of in-terest [6][13][22], rule set querying using SQL like lan-

[1]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[2]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[3]  Alípio Mário Jorge,et al.  Post-processing Operators for Browsing Large Sets of Association Rules , 2002, Discovery Science.

[4]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[5]  Alípio Mário Jorge,et al.  RECOMMENDATION WITH ASSOCIATION RULES: A WEB MINING APPLICATION , 2002 .

[6]  Yiming Ma,et al.  Web for data mining: organizing and interpreting the discovered rules using the Web , 2000, SKDD.

[7]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[8]  Shichao Zhang,et al.  Association Rule Mining: Models and Algorithms , 2002 .

[9]  Ke Wang,et al.  Interestingness-Based Interval Merger for Numeric Association Rules , 1998, KDD.

[10]  David Newman,et al.  Framework for a Generic Knowledge Discovery Toolkit , 1995, AISTATS.

[11]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[12]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[13]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[14]  Gediminas Adomavicius,et al.  Handling very large numbers of association rules in the analysis of microarray data , 2002, KDD.

[15]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[16]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[17]  Paulo J. Azevedo CAREN - A java based apriori implementation for classification purposes , 2003 .

[18]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[19]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[20]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[21]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[22]  Pang-Ning Tan,et al.  Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[25]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[26]  Jean-Marc Adamo,et al.  Data Mining for Association Rules and Sequential Patterns , 2000, Springer New York.