Panel: Evaluating Interactive Retrieval Systems

Most current information retrieval systems are highly interactive. Users ask queries, get immediate feedback, refine their queries, and so on. Methods for evaluating these dynamic systems have not kept pace with the rapid advances in system design. It is no longer enough to use the standard precision-recall measures to evaluate and to improve interactive retrieval systems. There is often no single final query to evaluate, with useful information being gathered from many different queries along the way. In addition, interfaces play a critical role in building effective retrieval systems. The best retrieval algorithm can be rendered functionally useless if the interface to it is unusable. Conversely, of course, the spiffiest new interface is not worth much without a good retrieval engine behind it. It would be easy if one could study interfaces and retrieval engines separately and take the best of both worlds. Unfortunately, there are important interactions that cannot be evaluated by studying components in isolation — e.g., how do you incorporate ranking or relevance feedback for a Boolean retrieval engine, or how do you highlight matching terms if complex syntactic and semantic processing of queries is used? The design of effective interactive retrieval environments will require careful attention to the larger human — interface — retrieval — engine system.