Development of a large, concept-oriented database for information retrieval

The development of concept-oriented databases using AI knowledge representation schemes is proposed as a step towards improving the precision and recall of information retrieval systems. Currently underway is the augmentation of a 238,000 citation database, Chemical Abstracts (CA) Volume 105, by addition of detailed conceptual information in the form of frames and hierarchies. The initial text data is parsed using natural language processing (NLP) techniques to create frames describing the semantics of the index entries in the database, with the slots in the frames being pointers into a very large semantic network of conceptual objects (956,000 objects). To examine the resultant knowledge base (KB), a simple hypertext system is proposed, with the conceptual information serving as pathways to connect related citations.