Automatic indexing: a tutorial

SIGIR, from its very onset in the early 1960's, has been concerned with the development of automatic information retrieval systems and with improving the effectiveness of automatic indexing. Automatic indexing is a subsystem, or component, of an automatic information retrieval system. The term "automatic" implies that the process is to be accomplished by a set of computer programs rather than by the intellectual effort of skilled people. The essential questions that need to be answered are how automatic indexing can:• Adequately represent the subject content of a document;• Improve recall by increasing the number of relevant documents retrieved;• Improve precision by decreasing the number of non-relevant documents retrieved.In this tutorial, I will review the basic and improved procedures that have been devised to respond to each of these questions. Finally, after we have reviewed developments in automatic indexing, we will consider the more radical questions of:• Whether we can rely exclusively on automatic indexing to achieve adequate retrieval effectiveness in online interactive document retrieval systems, and• Whether the traditional recall and precision ratios are the best criteria for evaluating the effectiveness of online retrieval systems.