A utility theoretic approach to determining optimal wait times in distributed information retrieval

Distributed IR systems query a large number of IR servers, merge the retrieved results and display them to users. Since different servers handle collections of different sizes, have different processing and bandwidth capacities, there can be considerable heterogeneity in their response times. The broker in the distributed IR system thus has to make decisions regarding terminating searches based on perceived value of waiting -- retrieving more documents -- and the costs imposed on users by waiting for more responses. In this paper, we apply utility theory to formulate the broker's decision problem. The problem is a stochastic nonlinear program. We use Monte Carlo simulations to demonstrate how the optimal wait time may be determined in the context of a comparison shopping engine that queries multiple store websites for price and product information. We use data gathered from 30 stores for a set of 60 books. Our research demonstrates how a broker can leverage information about past retrievals regarding distributions of server response time and relevance scores to optimize its performance. Our main contribution is the formulation of the decision model for optimal wait time and proposal of a solution method. Our results suggest that the optimal wait time is highly sensitive to the manner in which users value from a set of retrieved results differs from the sum of user value from each result evaluated independently. We also find that the optimal wait time increases with the size of the distributed collections, but only if user utility from a set of results is nearly equal to the sum of utilities from each result.

[1]  Luo Si,et al.  Using sampled data and regression to merge search engine results , 2002, SIGIR '02.

[2]  James P. Kelly,et al.  Simulation/optimization using "real-world" applications , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).

[3]  Fred W. Glover,et al.  New advances for wedding optimization and simulation , 1999, WSC '99.

[4]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[5]  John W. Payne,et al.  Effort and Accuracy in Choice , 1985 .

[6]  D. McFadden Econometric Models for Probabilistic Choice Among Products , 1980 .

[7]  Ramayya Krishnan,et al.  Designing a Better Shopbot , 2004, Manag. Sci..

[8]  D. Larcker,et al.  PERCEIVED USEFULNESS OF INFORMATION: A PSYCHOMETRIC EXAMINATION* , 1980 .

[9]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[10]  William E. Souder,et al.  Context and Antecedents of Information Utility at the R&D/Marketing Interface , 1996 .

[11]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[12]  Alistair Moffat,et al.  Information Retrieval Systems for Large Document Collections , 1994, TREC.

[13]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[14]  Oren Etzioni,et al.  A scalable comparison-shopping agent for the World-Wide Web , 1997, AGENTS '97.

[15]  Steven M. Shugan The Cost Of Thinking , 1980 .

[16]  Norbert Fuhr,et al.  A decision-theoretic approach to database selection in networked IR , 1999, TOIS.

[17]  Pattie Maes,et al.  Just-in-time information retrieval agents , 2000, IBM Syst. J..

[18]  Fred W. Glover,et al.  Optimization and system selection: simulation/optimization using "real-world" applications , 2001, WSC '01.

[19]  Erik Brynjolfsson,et al.  Consumer Decision-Making at an Internet Shopbot , 2001 .

[20]  Oren Etzioni,et al.  Efficient information gathering on the Internet , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[21]  Henry Lieberman,et al.  GOOSE: A Goal-Oriented Search Engine with Commonsense , 2002, AH.