Some hierarchical models for automatic document retrieval

Within the last few years, several automatic indexing and abstracting systems have been designed which are based primarily on word frequency counts and on techniques for measuring word and document associations. These systems are not wholly successful because both the sentence structure and the semantic relations between words are normally disregarded. An attempt is made in the present study to overcome the limitations of the strictly quantitative methods by presenting two systems for automatic document retrieval which are based on hierarchical storage arrangements as well as on the usual frequency counts and association measures. The first one utilizes a hierarchical arrangement similar to a library classification schedule, including lists of synonyms or related words, and cross-references. The second uses, in addition, a simplified form of syntactic analysis, thus making it possible to represent the syntactic dependency structure between individual words. The required retrieval operations are described briefly and are compared with those of the simpler quantitative model.