Adaptive parallelism for web search

A web search query made to Microsoft Bing is currently parallelized by distributing the query processing across many servers. Within each of these servers, the query is, however, processed sequentially. Although each server may be processing multiple queries concurrently, with modern multicore servers, parallelizing the processing of an individual query within the server may nonetheless improve the user's experience by reducing the response time. In this paper, we describe the issues that make the parallelization of an individual query within a server challenging, and we present a parallelization approach that effectively addresses these challenges. Since each server may be processing multiple queries concurrently, we also present a adaptive resource management algorithm that chooses the degree of parallelism at run-time for each query, taking into account system load and parallelization efficiency. As a result, the servers now execute queries with a high degree of parallelism at low loads, gracefully reduce the degree of parallelism with increased load, and choose sequential execution under high load. We have implemented our parallelization approach and adaptive resource management algorithm in Bing servers and evaluated them experimentally with production workloads. The experimental results show that the mean and 95th-percentile response times for queries are reduced by more than 50% under light or moderate load. Moreover, under high load where parallelization adversely degrades the system performance, the response times are kept the same as when queries are executed sequentially. In all cases, we observe no degradation in the relevance of the search results.

[1]  Arun Raman,et al.  Parallelism orchestration using DoPE: the degree of parallelism executive , 2011, PLDI '11.

[2]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[3]  Samuel P. Midkiff,et al.  Expressing and exploiting concurrency in networked applications with aspen , 2007, PPoPP.

[4]  Berkant Barla Cambazoglu,et al.  A refreshing perspective of search engine caching , 2010, WWW '10.

[5]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, CCRV.

[6]  Sudipto Guha,et al.  Improving the Performance of List Intersection , 2009, Proc. VLDB Endow..

[7]  Eitan Frachtenberg,et al.  Reducing Query Latencies in Web Search Using Fine-Grained Parallelism , 2009, World Wide Web.

[8]  Amitabh Sinha,et al.  Non-Clairvoyant Scheduling for Minimizing Mean Slowdown , 2003, Algorithmica.

[9]  Ron Sacks-Davis,et al.  Filtered document retrieval with frequency-sorted indexes , 1996 .

[10]  Torsten Suel,et al.  Using graphics processors for high-performance IR query processing , 2008, WWW.

[11]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[12]  Laxmi N. Bhuyan,et al.  Thread reinforcer: Dynamically determining number of threads via OS level monitoring , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[13]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[14]  Raj Vaswani,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[15]  Dimitrios S. Nikolopoulos,et al.  Effective cross-platform, multilevel parallelism via dynamic adaptive execution , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[16]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[17]  Shirish Tatikonda,et al.  Posting list intersection on multicore architectures , 2011, SIGIR.

[18]  W. Bruce Croft,et al.  Optimization strategies for complex queries , 2005, SIGIR '05.

[19]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[20]  Alexandros Stamatakis,et al.  Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems , 2007, Parallel Comput..

[21]  Jaejin Lee,et al.  Adaptive execution techniques for SMT multiprocessor architectures , 2005, PPOPP.

[22]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[23]  Michael F. P. O'Boyle,et al.  Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[24]  Amitabh Sinha,et al.  Non-clairvoyant Scheduling for Minimizing Mean Slowdown , 2003, STACS.

[25]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[26]  Alistair Moffat,et al.  Vector-space ranking with effective early termination , 2001, SIGIR '01.

[27]  Nathan Clark,et al.  Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications , 2010, ISCA.

[28]  Christo Wilson,et al.  Better never than late , 2011, SIGCOMM 2011.

[29]  Wolfgang Lehner,et al.  Fast Sorted-Set Intersection using SIMD Instructions , 2011, ADMS@VLDB.

[30]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[31]  Yuxiong He,et al.  Provably Efficient Online Nonclairvoyant Adaptive Scheduling , 2008, IEEE Trans. Parallel Distributed Syst..

[32]  Fabrizio Silvestri,et al.  Prefetching query results and its impact on search engines , 2012, SIGIR '12.

[33]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[34]  Yale N. Patt,et al.  Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs , 2008, ASPLOS.

[35]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).