iBench: Quantifying interference for datacenter applications

Interference between co-scheduled applications is one of the major reasons that causes modern datacenters (DCs) to operate at low utilization. DC operators traditionally side-step interference either by disallowing colocation altogether and providing isolated server instances, or by requiring the users to express resource reservations, which are often exaggerated to counter-balance the unpredictability in the quality of allocated resources. Understanding, reducing and managing interference can significantly impact the manner in which these large-scale systems operate. We present iBench, a novel workload suite that helps quantify the pressure different applications put in various shared resources, and similarly the pressure they can tolerate in these resources. iBench consists of a set of carefully-crafted benchmarks that induce interference of increasing intensity in resources that span the CPU, cache hierarchy, memory, storage and networking subsystems. We first validate the effect that iBench workloads have on performance against a wide spectrum of DC applications. Then, we use iBench to demonstrate the importance of considering interference in a set of challenging problems that range from DC scheduling and server provisioning, to resource-efficient application development and scheduling for heterogeneous CMPs. In all cases quantifying interference with iBench results in significant performance and/or efficiency improvements. We plan to release iBench under a free software license.

[1]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Christina Delimitrou,et al.  ECHO: Recreating network traffic maps for datacenters with tens of thousands of servers , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[3]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[4]  Aamer Jaleel,et al.  Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[5]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[6]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[7]  Robert M. Bell,et al.  The BellKor 2008 Solution to the Netflix Prize , 2008 .

[8]  Christina Delimitrou,et al.  Cross-Examination of Datacenter Workload Modeling Techniques , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[9]  Uri C. Weiser,et al.  MultiAmdahl: How Should I Divide My Heterogenous Chip? , 2012, IEEE Computer Architecture Letters.

[10]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[11]  Sriram Sankar,et al.  Server Engineering Insights for Large-Scale Online Services , 2010, IEEE Micro.

[12]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[13]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[14]  Christoforos E. Kozyrakis,et al.  Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[15]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[16]  Christina Delimitrou,et al.  Decoupling datacenter studies from access to large-scale applications: A modeling approach for storage workloads , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Jonathan A. Winter,et al.  Scheduling algorithms for unpredictably heterogeneous CMP architectures , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[18]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[19]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[20]  Luiz André Barroso,et al.  Warehouse-Scale Computing: Entering the Teenage Decade , 2011, SIGARCH Comput. Archit. News.

[21]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[22]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[24]  Berkin Özisikyilmaz,et al.  MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.

[25]  HölzleUrs,et al.  The Case for Energy-Proportional Computing , 2007 .

[26]  Rajeshwari Ganesan,et al.  Workload Modeling for Web-based Systems , 2003, Int. CMG Conference.

[27]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[28]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[29]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[30]  Lingjia Tang,et al.  Compiling for niceness: mitigating contention for QoS in warehouse scale computers , 2012, CGO '12.

[31]  Yehuda Koren,et al.  The BellKor solution to the Netflix Prize , 2007 .

[32]  Anja Feldmann,et al.  TCP/IP traffic dynamics and network performance: a lesson in workload modeling, flow control, and trace-driven simulations , 2001, CCRV.