Substucture Discovery in the SUBDUE System

Because many databases contain or can be embellished with structural information, a method for identifying interesting and repetitive substructures is an essential component to discovering knowledge in such databases. This paper describes the SUBDUE system, which uses the minimum description length (MDL) principle to discover substructures that compress the database and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. Inclusion of background knowledge guides SUBDUE toward appropriate substructures for a particular domain or discovery goal, and the use of an inexact graph match allows a controlled amount of deviations in the instance of a substructure concept. We describe the application of SUBDUE to a variety of domains. We also discuss approaches to combining SUBDUE with non-structural discovery systems.

[1]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[2]  R. Bharat Rao,et al.  Learning Engineering Models with the Minimum Description Length Principle , 1992, AAAI.

[3]  Robert Levinson,et al.  A Self-Organizing Retrieval System for Graphs , 1984, AAAI.

[4]  Edwin P. D. Pednault,et al.  Some Experiments in Applying Inductive Inference Principles to Surface Reconstruction , 1989, IJCAI.

[5]  Hans-Jörg Kreowski,et al.  Grammatical Inference Based on Hyperedge Replacement , 1990, Graph-Grammars and Their Application to Computer Science.

[6]  Mark Derthick,et al.  A Minimal Encoding Approach to Feature Discovery , 1991, AAAI.

[7]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[8]  Alex Pentland,et al.  Part Segmentation for Object Recognition , 1989, Neural Computation.

[9]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[10]  Laurent Miclet,et al.  Structural Methods in Pattern Recognition , 1986 .

[11]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[12]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[13]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[14]  Lawrence B. Holder,et al.  Fuzzy Substructure Discovery , 1992, ML.

[15]  David L. Waltz,et al.  Understanding Line drawings of Scenes with Shadows , 1975 .

[16]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[17]  P. Langley,et al.  Concept formation in structured domains , 1991 .

[18]  Hiroshi Motoda,et al.  Unifying Learning Methods by Colored Digraphs , 1993, ALT.

[19]  Jakub Segen Graph Clustering and Model Learning by Data Compression , 1990, ML.

[20]  Ronald E. Prather,et al.  Discrete mathematical structures for computer science , 1976 .

[21]  Lawrence B. Holder,et al.  Discovery of Inexact Concepts from Structural Data , 1993, IEEE Trans. Knowl. Data Eng..