Validating Query Simulators: An Experiment Using Commercial Searches and Purchases

We design and validate simulators for generating queries and relevance judgments for retrieval system evaluation. We develop a simulation framework that incorporates existing and new simulation strategies. To validate a simulator, we assess whether evaluation using its output data ranks retrieval systems in the same way as evaluation using real-world data. The real-world data is obtained using logged commercial searches and associated purchase decisions. While no simulator reproduces an ideal ranking, there is a large variation in simulator performance that allows us to distinguish those that are better suited to creating artificial testbeds for retrieval experiments. Incorporating knowledge about document structure in the query generation process helps create more realistic simulators.

[1]  Laura Hollink,et al.  Search behavior of media professionals at an audiovisual archive: A transaction log analysis , 2010 .

[2]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[3]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[4]  W. Bruce Croft,et al.  Query reformulation using anchor text , 2010, WSDM '10.

[5]  Jean Tague-Sutcliffe,et al.  Simulation of User Judgments in Bibliographic Retrieval Systems , 1981, SIGIR.

[6]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[7]  Kalervo Järvelin,et al.  Test Collection-Based IR Evaluation Needs Extension toward Sessions - A Case of Extremely Short Queries , 2009, AIRS.

[8]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[9]  Leif Azzopardi Query side evaluation: an empirical analysis of effectiveness and effort , 2009, SIGIR.

[10]  Qigang Gao,et al.  Using controlled query generation to evaluate blind relevance feedback algorithms , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[11]  M. de Rijke,et al.  Building simulated queries for known-item topics: an analysis using six european languages , 2007, SIGIR.

[12]  ChengXiang Zhai,et al.  Evaluation of methods for relative comparison of retrieval systems based on clickthroughs , 2009, CIKM.

[13]  Katja Hofmann,et al.  Comparing click-through data to purchase decisions for retrieval evaluation , 2010, SIGIR '10.

[14]  Donna K. Harman,et al.  The TREC Test Collections , 2005 .

[15]  Michael D. Gordon Evaluating the effectiveness of information retrieval systems using simulated queries , 1990, J. Am. Soc. Inf. Sci..

[16]  Jean Tague-Sutcliffe,et al.  Problems in the simulation of bibliographic retrieval systems , 1980, SIGIR '80.