Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers

Heterogeneous multicore architectures have the potential to improve energy efficiency by integrating power-efficient wimpy cores with high-performing brawny cores. However, it is an open question as how to deliver energy reduction while ensuring the quality of service (QoS) of latency-sensitive web-services running on such heterogeneous multicores in warehouse-scale computers (WSCs). In this work, we first investigate the implications of heterogeneous multicores in WSCs and show that directly adopting heterogeneous multicores without re-designing the software stack to provide QoS management leads to significant QoS violations. We then present Octopus-Man, a novel QoS-aware task management solution that dynamically maps latency-sensitive tasks to the least power-hungry processing resources that are sufficient to meet the QoS requirements. Using carefully-designed feedback-control mechanisms, Octopus-Man addresses critical challenges that emerge due to uncertainties in workload fluctuations and adaptation dynamics in a real system. Our evaluation using web-search and memcached running on a real-system Intel heterogeneous prototype demonstrates that Octopus-Man improves energy efficiency by up to 41% (CPU power) and up to 15% (system power) over an all-brawny WSC design while adhering to specified QoS targets.

[1]  Gene F. Franklin,et al.  Digital control of dynamic systems , 1980 .

[2]  Gene F. Franklin,et al.  Feedback Control of Dynamic Systems , 1986 .

[3]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[4]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[5]  Yixin Diao,et al.  Feedback Control of Computing Systems , 2004 .

[6]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[7]  Sang Hyuk Son,et al.  Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers , 2006, IEEE Transactions on Parallel and Distributed Systems.

[8]  Xue Liu,et al.  Dynamic Voltage Scaling in Multitier Web Servers with End-to-End Delay Control , 2007, IEEE Transactions on Computers.

[9]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[10]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[11]  Christoforos E. Kozyrakis,et al.  A Comparison of High-Level Full-System Power Models , 2008, HotPower.

[12]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[13]  Vanish Talwar,et al.  Power Management of Datacenter Workloads Using Per-Core Power Gating , 2009, IEEE Computer Architecture Letters.

[14]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[15]  Manuel Prieto,et al.  Maximizing Power Efficiency with Asymmetric Multicore Systems , 2009, ACM Queue.

[16]  Urs Hölzle,et al.  Brawny cores still beat wimpy cores, most of the time , 2010 .

[17]  Tong Li,et al.  Operating system support for overlapping-ISA heterogeneous multi-core architectures , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[18]  Luiz André Barroso,et al.  Guest Editors' Introduction: Datacenter-Scale Computing , 2010, IEEE Micro.

[19]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[20]  Chita R. Das,et al.  Towards characterizing cloud backend workloads: insights from Google compute clusters , 2010, PERV.

[21]  Margaret Martonosi,et al.  Capping the brown energy consumption of Internet services at low cost , 2010, International Conference on Green Computing.

[22]  Gerhard Wellein,et al.  LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[23]  Parthasarathy Ranganathan Recipe for efficiency: principles of power-aware computing , 2010, CACM.

[24]  Sriram Sankar,et al.  Server Engineering Insights for Large-Scale Online Services , 2010, IEEE Micro.

[25]  Gang Ren,et al.  Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers , 2010, IEEE Micro.

[26]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[27]  Pradip Bose,et al.  A case for guarded power gating for multi-core processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[28]  Norman P. Jouppi,et al.  System-level integrated server architectures for scale-out datacenters , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Kushagra Vaid,et al.  Mobile processors for energy-efficient web search , 2011, TOCS.

[30]  Tajana Rosing,et al.  Utilizing green energy prediction to schedule mixed batch and service jobs in data centers , 2011, OPSR.

[31]  Xi Yang,et al.  Looking back on the language and hardware revolutions: measured power, performance, and scaling , 2011, ASPLOS XVI.

[32]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[33]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[34]  Lingjia Tang,et al.  Heterogeneity in “Homogeneous” Warehouse-Scale Computers: A Performance Opportunity , 2011, IEEE Computer Architecture Letters.

[35]  Lingjia Tang,et al.  Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up , 2012, IEEE Micro.

[36]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[37]  Li Zhao,et al.  QuickIA: Exploring heterogeneous architectures on real prototypes , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[38]  Daniel Wong,et al.  KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[39]  Babak Falsafi,et al.  Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[40]  Eitan Frachtenberg Holistic Datacenter Design in the Open Compute Project , 2012, Computer.

[41]  Manuel Prieto,et al.  Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems , 2012, TOCS.

[42]  Christoforos E. Kozyrakis,et al.  Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[43]  Jason Cong,et al.  Energy-efficient scheduling on heterogeneous multi-core architectures , 2012, ISLPED '12.

[44]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[45]  Wei Wang,et al.  ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers , 2013, ASPLOS '13.

[46]  Benjamin C. Lee,et al.  Navigating heterogeneous processors with market mechanisms , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[47]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[48]  Lingjia Tang,et al.  Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.

[49]  Thu D. Nguyen,et al.  Parasol and GreenSwitch: managing datacenters powered by renewable energy , 2013, ASPLOS '13.

[50]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[51]  Thomas F. Wenisch,et al.  Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.

[52]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[53]  Xiao Zhang,et al.  Optimizing Google's warehouse scale computers: The NUMA experience , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[54]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[55]  Lingjia Tang,et al.  SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[56]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[57]  Gene F. Franklin,et al.  Digital Control Of Dynamic Systems 3rd Edition , 2014 .

[58]  Xiao Zhang,et al.  HaPPy: Hyperthread-aware Power Profiling Dynamically , 2014, USENIX Annual Technical Conference.

[59]  Lingjia Tang,et al.  Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[60]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).