Graph-based hierarchical conceptual clustering

Hierarchical conceptual clustering has proven to be a useful, although under-explored, data mining technique. A graph-based representation of structural information combined with a substructure discovery technique has been shown to be successful in knowledge discovery. The SUBDUE substructure discovery system provides one such combination of approaches. This work presents SUBDUE and the development of its clustering functionalities. Several examples are used to illustrate the validity of the approach both in structured and unstructured domains, as well as to compare SUBDUE to the Cobweb clustering algorithm. We also develop a new metric for comparing structurally-defined clusterings. Results show that SUBDUE successfully discovers hierarchical clusterings in both structured and unstructured data.

[1]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[2]  Brian Everitt,et al.  Cluster analysis , 1974 .

[3]  Horst Bunke,et al.  Inexact graph matching for structural pattern recognition , 1983, Pattern Recognit. Lett..

[4]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[5]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[6]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  P. Langley,et al.  Concept formation in structured domains , 1991 .

[9]  Lawrence B. Holder,et al.  Discovery of Inexact Concepts from Structural Data , 1993, IEEE Trans. Knowl. Data Eng..

[10]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[11]  Lawrence B. Holder,et al.  Scalable Discovery of Informative Structural Concepts Using Domain Knowledge , 1996, IEEE Expert.

[12]  Lawrence B. Holder,et al.  An Emprirical Study of Domain Knowledge and Its Benefits to Substructure Discovery , 1997, IEEE Trans. Knowl. Data Eng..

[13]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[14]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[15]  Lawrence B. Holder,et al.  Applying the Subdue Substructure Discovery System to the Chemical Toxicity Domain , 1999, FLAIRS Conference.

[16]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[17]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[18]  Lawrence B. Holder,et al.  Application of Knowledge Discovery to Molecular Biology: Identifying Structural Regularities in Proteins , 1998, Pacific Symposium on Biocomputing.

[19]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[20]  Lawrence B. Holder,et al.  Structural Knowledge Discovery Used to Analyze Earthquake Activity , 2000, FLAIRS Conference.

[21]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[22]  Lawrence B. Holder,et al.  Discovering Structural Patterns in Telecommunications Data , 2000, FLAIRS.

[23]  Lawrence B. Holder,et al.  Application of Graph-Based Concept Learning to the Predictive Toxicology Domain , 2001 .

[24]  Zdravko Markov,et al.  A Lattice-Based Approach to Hierarchical Clustering , 2001, FLAIRS.

[25]  ISTVAN JONYER,et al.  Graph-Based Hierarchical Conceptual Clustering , 2000, Int. J. Artif. Intell. Tools.

[26]  Lawrence B. Holder,et al.  Approaches to Parallel Graph-Based Knowledge Discovery , 2001, J. Parallel Distributed Comput..