A Knowledge-Based System for Composite Document Analysis and Retrieval: Design Issues in the CODER Project

The CODER (COmposite Document Expert/Extended/Effective Retrieval) Project aims at applying a variety of methods developed in the realm of artificial intelligence to improve the performance of information retrieval systems. A prototype CODER system is being developed and will serve as a testbed for future research in this area. Initial experimentation will take place on a collection of more than three years of issues of the AIList ARPANET Digest CODER is being developed in MU-Prolog and C++ as a collection of experts communicating through central blackboards using UNIX™ pipes and the TCP/IP protocol. This distributed system can be divided up across several machines, to best utilize special display devices, storage facilities, and processors. There is a central spine, including document text and document knowledge representations, and a large lexicon being constructed from two machine readable, English dictionaries. An entry/analysis subsystem carries out detailed analysis of composite, documents, determining the structure and type of the whole and of each part. An access/retrieval subsystem has models of each user, can accomodate a variety of query languages, and supports browsing, searching, and immediate feedback. Many issues must be dealt with in the design of such a system, including issues of knowledge representation, natural language processing, storage management and support environments. This paper gives background, describes related work, explains the design principles and architecture, and closes with future plans.