Simulating Search Engines

A simulation methodology is proposed to evaluate the performance of large-scale Web search engines hosted by datacenters. The salient features of the methodology are the use of models of parallel computing to overcome the complexities associated with the simulation of hardware and system software details; a circulating tokens approach to represent sequences of operations that compete for search engine resources; benchmark programs to measure the cost of relevant operations; and simulations driven by real user traces to consider the dynamics of user behavior. An experimental evaluation of the methodology, which ranges from clusters of processors to single multithreaded processors, shows that it can generate respective simulation programs capable of predicting performance in a precise and efficient manner.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  Veronica Gil Costa,et al.  New caching techniques for web search engines , 2010, HPDC '10.

[3]  Özgür Ulusoy,et al.  Adaptive Time-to-Live Strategies for Query Result Caching in Web Search Engines , 2012, ECIR.

[4]  Torsten Suel,et al.  Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[5]  Gabriel A. Wainer,et al.  DEVS modeling of large scale Web Search Engines , 2014, Proceedings of the Winter Simulation Conference 2014.

[6]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[7]  Surajit Chaudhuri,et al.  Interval-based pruning for top-k processing over compressed lists , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[8]  Leslie G. Valiant A Bridging Model for Multi-core Computing , 2008, ESA.

[9]  Craig MacDonald,et al.  Learning to predict response times for online query scheduling , 2012, SIGIR '12.

[10]  Mauricio Marín,et al.  A Last-Resort Semantic Cache for Web Queries , 2009, SPIRE.

[11]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[12]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[13]  Diego Arroyuelo,et al.  Document identifier reassignment and run-length-compressed inverted indexes for improved search performance , 2013, SIGIR.

[14]  Mauricio Marín,et al.  Modelling Search Engines Performance Using Coloured Petri Nets , 2014, Fundam. Informaticae.

[15]  Mauricio Marín,et al.  Multithreaded Processing in Dynamic Inverted Indexes for Web Search Engines , 2015, LSDS-IR@CIKM.

[16]  Gabriel A. Wainer,et al.  Parallel Environment for DEVS and Cell-DEVS Models , 2007, Simul..

[17]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[18]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[19]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.