Data mining has informally been introduced as large scale search for interesting patterns in data. It is often an explorative task iteratively performed within the process of knowledge discovery in databases. In this process, interactive visualization techniques are also successfully applied for data exploration. We deal with the synergy of these two complemental approaches. Whereas datamining typically relies on strategies for systematic search in large hypotheses spaces guided by the autonomous evaluation of statistical tests, interactive visualization activates the visual capacities of an analyst to identify patterns that may also stimulate the further direction of the exploration process. We demonstrate some possibilities to combine these approaches for the area of data mining in document collections. Document Explorer is a system that offers various preprocessing tools to prepare collections of text or multimedia documents which are available in distributed environments (e.g. Internet and Intranet) for data mining applications, and includes data mining methods based on searching for patterns like frequent sets or association rules. Keyword graphs are used in this system as an highly interactive technique to present the mining results. The user can operate on the visualized results, either to redirect the data mining process, to filter and structure the results, to link several graphs, or to browse into the document collection. Thus in the keyword graphs, the relations between interesting sets of keywords are presented (the sets may also be regarded as retrieval queries to be posed to the collection) and made operable to the analyst.
[1]
A. Inselberg,et al.
Parallel coordinates for visualizing multi-dimensional geometry
,
1987
.
[2]
Heikki Mannila,et al.
Efficient Algorithms for Discovering Association Rules
,
1994,
KDD Workshop.
[3]
Heikki Mannila,et al.
Finding interesting rules from large sets of discovered association rules
,
1994,
CIKM '94.
[4]
Timo Honkela,et al.
Self-Organizing Maps of Document Collections
,
1996
.
[5]
Alfred Inselberg,et al.
Parallel coordinates for visualizing multi-dimensional geometry
,
1987
.
[6]
AgrawalRakesh,et al.
Mining association rules between sets of items in large databases
,
1993
.
[7]
Gerald Salton,et al.
Automatic text processing
,
1988
.
[8]
Marcel Holsheimer,et al.
Data Surveyor: Searching the Nuggets in Parallel
,
1996,
Advances in Knowledge Discovery and Data Mining.
[9]
Willi Klösgen,et al.
Explora: A Multipattern and Multistrategy Discovery Assistant
,
1996,
Advances in Knowledge Discovery and Data Mining.
[10]
David R. Karger,et al.
Constant interaction-time scatter/gather browsing of very large document collections
,
1993,
SIGIR.
[11]
Haym Hirsh,et al.
Mining Associations in Text in the Presence of Background Knowledge
,
1996,
KDD.
[12]
Friedrich Gebhardt,et al.
Finding Spatial Clusters
,
1997,
PKDD.
[13]
Ronen Feldman,et al.
Document Explorer: Discovering Knowledge in Document Collections
,
1997,
ISMIS.