The Cranfield Hypotheses [with Comment]

T HE paper by Swanson1 on the Cranfield experiments is a welcome contribution to the literature on the problems involved in the design of tests of information-retrieval systems. As he says, the report of the Cranfield tests should have been critically reviewed soon after publication, and from our viewpoint it is unfortunate that his paper should not appear until a time when it has so little relevance to the techniques being used in the present Cranfield work. However, it points out clearly the weaknesses in the design of the original project and should certainly be required reading for anyone who comes fresh to the field of evaluation; for, although Swanson has produced an impressive list of references on the Cranfield work, some recent proposals for evaluation of information-retrieval systems are so full of similar or worse errors of design that it would seem that the Cranfield projects might never have been done. It is not my intention to attempt to answer all the criticisms made by Swanson. Many are unanswerable, except, of course, in the general terms that this was a test designed in 1956. However, I cannot help but feel that, in spite of its admitted weaknesses, the test design must, over all, have been reasonably good in that it provided data which enabled us to list a number of tentative conclusions which are being substantiated by later and more rigorous tests. Swanson lists ten such findings; I would group some of these and reduce the conclusions which he quotes to the following points. 1. No significant improvement in indexing is likely beyond an indexing time of four minutes. 2. Trained indexers are able to do consistently good indexing although they lack subject knowledge. 3. Indications are that informationretrieval systems are operating normally at a recall ratio between 70 and 90 per cent and in the range of 8-20 per cent precision.2 4. There is an optimum level of exhaustivity of indexing. To index beyond this limit will do little to improve recall ratio but will seriously weaken the precision ratio. 5. There is an inevitable inverse relationship between recall and precision. 6. Within the normal operating range of a system, a 1 per cent improvement in relevance will result in a 3 per cent drop in recall. 7. The most significant result of the main test program was that all four indexing methods were operating at