Automatic Classification of Book material Represented by Back‐of‐the‐Book Index

A research project is reported in which techniques for the automatic classification of book material were investigated. Attention was focussed on three fundamental issues, namely: the computer‐based surrogation of monographic material, the clustering of book surrogates on the basis of content association, and the evaluation of the resultant classifications. A test collection of 250 books, which was assembled on behalf of the project, is described together with its surrogation by means of the complete back‐of‐the‐book index, table of contents, title and Dewey classification code(s) of each volume. Some properties of hierarchic and non‐hierarchic automatic classifications of the test collection are discussed, followed by their evaluation with reference to a small set of queries and relevance judgements. Finally, a less formal evaluation of the classifications in terms of the logical appeal of the cluster membership is reported. The work has shown that, on a small experimental scale and in the context of the test data used, automatic classifications of book material represented by index list can be produced which are superior, on the basis of a generalized measure of effectiveness, to a conventional library classification of the same material.