AID, an Associative Interactive Dictionary for online searching

The paper describes the prototype Associative Interactive Dictionary. (AID) system for search strategy formulation on a large operational free text on‐line bibliographic retrieval system. The primary design objective of the Associative Interactive Dictionary is the automatic generation and display of related terms, synonyms, broader and narrower terms and other semantic associations for given search concepts. The associative analysis procedures rely on statistical frequency distribution information about term occurrences in a set of document texts retrieved in response to a Boolean search query and the occurrence frequencies of the same terms in the entire data base. Over the past two decades, a number of small experimental retrieval systems have utilized term associations for automatic or semiautomatic document classification, indexing, thesaurus building or as a search aid. These experimental systems primarily employ term‐term and term‐document matrices for the computation of similarity measures between and among terms and documents. The matrix technique can not be implemented efficiently and cost effectively on large operational retrieval systems owing to problems of scale limitations. The major on‐line bibliographic search systems, such as ELHILL, ORBIT, DIALOG, RECON, BRS and others, do not provide any search aids other than the inherent browsing capability, term truncation and/or sequential string searching. In some files, manually constructed on‐line thesauri offer partial assistance to the user. The prototype AID system overcomes the problems of scale by utilizing a computationally efficient similarity measure and a highly compressed in‐core hash table of terms and term frequencies. The hash table can accommodate tens of thousands of free text search terms. Both an on‐line version and a batch version of the Associative Interactive Dictionary system are currently operational on TOXLINE, a large file of over 400,000 journal citations with abstracts on toxicology and the environment. TOXLINE is one of several on‐line data bases on the National Library of Medicine's ELHILL retrieval system. The overal design of the AID system is general in nature, and therefore it can be implemented on other large operational retrieval systems.