Object Modeling Using Classification in CANDIDE and its Applications

CANDIDE is a semantic object model based on the FL-, KANDOR and BACK frame-based knowledge representation languages. A novel feature of this model is that the DDL and DML are identical, thus providing uniform treatment of data objects, query objects and view objects. The classification algorithm finds the correct placement for a data object at definition time and query object at querying time in a given object taxonomy. The fundamental criterion for such correct placement is the subsumption relationship between two object classes. The extensions make CANDIDE a viable data model. Classification can be applied effectively to many database problems. Here we describe two such applications. The first is integration of a set of heterogeneous database systems. In this approach classification is used for schema integration as well as query evaluation. The global schema is automatically generated by repeatedly applying classification to each object of the component Schemas ensuring the correctness of the global schema. Deductive reasoning provided by classification offers many unique advantages to global query evaluation, such as validating the correctness of queries before evaluation. The second application is a document retrieval system. In this approach, the knowledge representation capability of CANDIDE is used to capture the structural and conceptual information from a document. The inexact querying capability based on classification makes continuous query refinement unnecessary. The expressiveness of CANDIDE also makes it extremely suitable for natural language query interfaces. Natural language queries are mapped into CANDIDE objects which also act as query objects. Additionally, the semantics of the database objects are directly used by the language processor leading to better language understanding.