Experiments in book indexing by computer

Abstract The most challenging task in preparing an index to a book is to select all and only those terms that are related to the text and are useful for reference purposes. While a knowledgeable human can make the selection on an intuitive basis, automatic indexing requires a precise operational criterion for defining and selecting good and useful index terms. Two principles of selection are proposed: specification and selection of useful terms, and specification and exclusion of useless terms. Because of the nebulous nature and meaning of “good index terms”, and the difficulties involved in devising machine algorithms for their selection, this research in automatic indexing is based on the principle of excluding useless terms. Even so, fully automatic indexing was not achieved in this study. Single words proved to be of little value as index terms. Multiple word terms were generated by the computer, but no algorithm could successfully eliminate the useless phrases. Final selection had to be made by the experimenter. A comprehensive and useful book index was achieved by using machine-aided rather than fully automated indexing techniques.