Automatic term classifications and retrieval

Abstract Recent research at the Cambridge Language Research Unit has been concerned with the application of the automatic classification techniques associated with the “theory of clumps” to document descriptions obtained from the Aslib-Cranfield project, and with the use of the resulting term classifications in retrieval. A substantial program engine has been developed which computes similarities between pairs of terms on the basis of their occurrences and co-occurrences in document descriptions, and finds classes of terms with strong similarity connections by minimizing the cohesion between a potential clump and its complement; and which retrieves using single terms and/or term classes according to specification, and calculates recall and precision ratios for sets of requests. Serious tests with different similarity and clump definitions and with different modes of using term classes are still in progress, so no definite conclusions about the value of this kind of classification are presented; the experiments which have been carried out nevertheless lead to some important questions about clump and similarity definitions, about the nature of term classes, about their use in retrieval, and about the evaluation of their performance, which are discussed in detail.