TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services

[1]  Nathan Clark,et al.  Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications , 2010, ISCA.

[2]  Torsten Suel,et al.  Using graphics processors for high-performance IR query processing , 2008, WWW.

[3]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[4]  Alexandros Stamatakis,et al.  Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems , 2007, Parallel Comput..

[5]  Wolfgang Lehner,et al.  Fast integer compression using SIMD instructions , 2010, DaMoN '10.

[6]  Shirish Tatikonda,et al.  Posting list intersection on multicore architectures , 2011, SIGIR.

[7]  Alan L. Cox,et al.  Adaptive parallelism for web search , 2013, EuroSys '13.

[8]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[9]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[10]  Ricardo Bianchini,et al.  Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services , 2015, ASPLOS.

[11]  Eitan Frachtenberg,et al.  Reducing Query Latencies in Web Search Using Fine-Grained Parallelism , 2009, World Wide Web.

[12]  Shaolei Ren,et al.  Exploiting Processor Heterogeneity in Interactive Services , 2013, ICAC.

[13]  Sameh Elnikety,et al.  Tians Scheduling: Using Partial Processing in Best-Effort Applications , 2011, 2011 31st International Conference on Distributed Computing Systems.

[14]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[15]  Christo Wilson,et al.  Better never than late , 2011, SIGCOMM 2011.

[16]  Raj Vaswani,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[17]  Vijay Janapa Reddi,et al.  High-performance and energy-efficient mobile web browsing on big/little systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[18]  Margaret Martonosi,et al.  Characterizing and improving the performance of Intel Threading Building Blocks , 2008, 2008 IEEE International Symposium on Workload Characterization.

[19]  Ion Stoica,et al.  The Power of Choice in Data-Aware Cluster Scheduling , 2014, OSDI.

[20]  Craig MacDonald,et al.  Learning to predict response times for online query scheduling , 2012, SIGIR '12.

[21]  Fabrizio Silvestri,et al.  Prefetching query results and its impact on search engines , 2012, SIGIR '12.

[22]  Seung-won Hwang,et al.  Predictive parallelization: taming tail latencies in web search , 2014, SIGIR.

[23]  Arun Raman,et al.  Parallelism orchestration using DoPE: the degree of parallelism executive , 2011, PLDI '11.

[24]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[25]  Berkant Barla Cambazoglu,et al.  A refreshing perspective of search engine caching , 2010, WWW '10.

[26]  Stijn Eyerman,et al.  The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism , 2014, ASPLOS.

[27]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, CCRV.

[28]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[29]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[30]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[31]  Gustavo Alonso,et al.  Pydron: Semi-Automatic Parallelization for Multi-Core and the Cloud , 2014, OSDI.

[32]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[33]  Sebastian Burckhardt,et al.  The design of a task parallel library , 2009, OOPSLA 2009.

[34]  Srikanth Kandula,et al.  Speeding up distributed request-response workflows , 2013, SIGCOMM.

[35]  Alfons Kemper,et al.  Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems , 2012, Proc. VLDB Endow..

[36]  Chang-Gun Lee,et al.  Multicore scheduling of parallel real-time tasks with multiple parallelization options , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[37]  Ronald G. Dreslinski,et al.  Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[38]  Wolfgang Lehner,et al.  Fast Sorted-Set Intersection using SIMD Instructions , 2011, ADMS@VLDB.

[39]  Yuxiong He,et al.  Provably Efficient Online Nonclairvoyant Adaptive Scheduling , 2008, IEEE Trans. Parallel Distributed Syst..