MAXIMUM-DEPTH INDEXING FOR COMPUTER RETRIEVAL OF ENGLISH LANGUAGE DATA

One of the simplest and yet most powerful methods for organizing natural language data for deep retrieval is the complete index. Such an index completely characterizes a text corpus for computer retrieval operations without prohibitive cost in space or time. A general‐purpose indexer that has been programmed for the IBM 7090 is described and discussed. This system, starting with unedited English text, produces an index of all or any subset of words in that text. For each word indexed, the volume, chapter, paragraph, and sentence number for each of its occurrences in the text is cited. Words with the same root, such as farmer and farming, are cross referenced to each other. Words that are almost precisely synonymous such as Britain and England are also cross referenced. Uses of the index for finding information relevant to answering English questions are briefly described.