论文信息 - Hierarchical Clustering for Thematic Browsing and Summarization of Large Sets of Association Rules

Hierarchical Clustering for Thematic Browsing and Summarization of Large Sets of Association Rules

Abstract In this paper we propose a method for grouping and summa-rizing large sets of association rules according to the itemscontained in each rule. We use hierarchical clustering to par-tition the initial rule set into thematically coherent subsets.This enables the summarization of the rule set by adequatelychoosing a representative rule for each subset, and helps inthe interactive exploration of the rule model by the user. Wedeﬁne the requirements of our approach, and formally showthe adequacy of the chosen approach to our aims. Rule clus-ters can also be used to infer novel interest measures forthe rules. Such measures are based on the lexicon of therules and are complementary to measures based on statisti-cal properties, such as conﬁdence, lift and conviction. Weshow examples of the application of the proposed techniques. 1 Introduction.Despite being popular as a technique for market basketanalysis, association rules [1][26] are now used in manydiﬀerent applications, from modeling web user prefer-ences [9], to studying census data [6]. The apriori al-gorithm [2], and variants [6][20][23], among others, arethe standard technique for association rule discovery.The mining process, however, is not ﬁnished when therules are produced. A set of association rules is mostlya descriptive model that typically requires post process-ing before actionable information (information that canbe acted upon in order to produce value [5]) is found.Moreover, due to the completeness of the rule discoveryalgorithm, the set of rules generated for a single prob-lem can be very large, easily reaching hundreds or eventhousands of rules [13].Post processing techniques mainly encompass ruleﬁltering (or pruning), using statistical measures of in-terest [6][13][22], rule set querying using SQL like lan-

Alípio Mário Jorge

[1] Hans C. van Houwelingen,et al. The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[2] Giuseppe Psaila,et al. A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[3] Alípio Mário Jorge,et al. Post-processing Operators for Browsing Large Sets of Association Rules , 2002, Discovery Science.

[4] A. K. Pujari,et al. Data Mining Techniques , 2006 .

[5] Alípio Mário Jorge,et al. RECOMMENDATION WITH ASSOCIATION RULES: A WEB MINING APPLICATION , 2002 .

[6] Yiming Ma,et al. Web for data mining: organizing and interpreting the discovered rules using the Web , 2000, SKDD.

[7] Shamkant B. Navathe,et al. An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[8] Shichao Zhang,et al. Association Rule Mining: Models and Algorithms , 2002 .

[9] Ke Wang,et al. Interestingness-Based Interval Merger for Numeric Association Rules , 1998, KDD.

[10] David Newman,et al. Framework for a Generic Knowledge Discovery Toolkit , 1995, AISTATS.

[11] Jennifer Widom,et al. Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.