论文信息 - Hurry-up: Scaling Web Search on Big/Little Multi-core Architectures

Hurry-up: Scaling Web Search on Big/Little Multi-core Architectures

Heterogeneous multi-core systems such as big/little architectures have been introduced as an attractive server design option with the potential to improve performance under power constraints in data centres. Since both big high-performing and little power-efficient cores can run on the same system sharing the workload processing, thread mapping/scheduling turns out to be much more challenging. This is particularly hard when considering the different trade-offs shaped by the heterogeneous cores on the quality-of-service (expressed as tail latency) experienced by user-facing applications, such as Web Search. In this work, we present Hurry-up, a runtime thread mapping solution designed to select individual requests to run on the most appropriate heterogeneous cores to improve tail latency. Hurry-up accelerates compute-intensive requests on big cores, while letting less intensive threads to execute on little cores. We implement and deploy Hurry-up on a real 64-bit big/little architecture (ARM Juno), and show that, compared to a conservative policy on Linux, Hurry-up reduces the server tail latency by 39.5% (mean).

Rajiv Nishtala | Paul Carpenter | Xavier Martorell | Vinicius Petrucci

[1] Chenyang Lu,et al. Work stealing for interactive services to meet target latency , 2016, PPoPP.

[2] Christoforos E. Kozyrakis,et al. Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[3] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[4] Kushagra Vaid,et al. Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[5] Daniel Mossé,et al. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[6] Xu Zhou,et al. GreenGear: Leveraging and Managing Server Heterogeneity for Improving Energy Efficiency in Green Data Centers , 2016, ICS.

[7] Shaolei Ren,et al. Exploiting Processor Heterogeneity in Interactive Services , 2013, ICAC.

[8] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[9] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10] T. N. Vijaykumar,et al. TimeTrader: Exploiting latency tail to save datacenter energy for online search , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11] Ricardo Bianchini,et al. Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services , 2015, ASPLOS.

[12] Paul M. Carpenter,et al. Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[13] Luiz André Barroso,et al. Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[14] Seung-won Hwang,et al. Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search , 2015, WSDM.

[15] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[16] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[17] Daniel Wong,et al. KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[18] Ronald G. Dreslinski,et al. Reining in Long Tails in Warehouse-Scale Computers with Quick Voltage Boosting Using Adrenaline , 2017, ACM Trans. Comput. Syst..

[19] Rajiv Nishtala,et al. Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[20] Daniel Sánchez,et al. Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).