We propose a new methodology for visualizing association mining results. Inter-item distances are computed from combinations of itemset supports. The new distances retain a simple pairwise structure, and are consistent with important frequently occurring itemsets. Thus standard tools of visualization, e.g. hierarchical clustering dendrograms can still be applied, while the distance information upon which they are based is richer. Our approach is applicable to general association mining applications, as well as applications involving information spaces modeled by directed graphs, e.g. the Web. In the context of collections of hypertext documents, the inter-document distances capture the information inherent in a collection's link structure, a form of link mining. We demonstrate our methodology with document sets extracted from the Science Citation Index, applying a metric that measures consistency between clusters and frequent itemsets.
[1]
Jon Kleinberg,et al.
Authoritative sources in a hyperlinked environment
,
1999,
SODA '98.
[2]
Tomasz Imielinski,et al.
Mining association rules between sets of items in large databases
,
1993,
SIGMOD Conference.
[3]
John A. Hartigan,et al.
Clustering Algorithms
,
1975
.
[4]
Henry G. Small,et al.
Co-citation in the scientific literature: A new measure of the relationship between two documents
,
1973,
J. Am. Soc. Inf. Sci..
[5]
Ramakrishnan Srikant,et al.
Fast algorithms for mining association rules
,
1998,
VLDB 1998.