Conceptual Clustering in Database Systems

Classes are an integral part of all semantic data models. Despite this, class formation in these data models is ad hoc due to the varied treatment of classes and because the issue of grouping instances into classes is considered an art rather than a science. It is the view of this paper that class formation be based on category theory through the use of an aJtribute-based purpose-dieected conceptual clustering technique. Several issues concerned with category theory, especially exception handling, are discussed. The emphasis in this approach is on reasoning at the instance level. Schema generation occurs as a result of conceptually clustering the underlying data instances and guiding this process by specifying a context in the form of a clustering seed. The use of this approach in the areas of schema integration, schema evolution and querying will be discussed. These facilities have been implemented on a database system based on the CANDIDE [3] semantic data model. CANDIDE is essentially an extended version of the tenn-subsumption languages known as the KL-ONE family of languages.