Evaluating the effectiveness of information retrieval systems using simulated queries

Simulation is a widely used lnvestlgative tool for exploring proposed systems without incurring the costs of actually building them. Studying the effectiveness of document retrieval systems by means of simulation, however, has remalned elusive because of the guesswork Involved in deciding whether a document with a given (simulated) description will be relevant to the information need of an inquirer expressed by a given (simulated) query. In this article, a simulation method is described for estimating recall and fallout In a document retrieval system. Examples are presented to illustrate the method, and it is justified probabillstically. By using this method, we may compare the effectiveness of employing various indexing procedures, compare the effectiveness of employing various matching functions, or obtain absolute effectiveness measures for a proposed system.