uqSim: Scalable and Validated Simulation of Cloud Microservices

Current cloud services are moving away from monolithic designs and towards graphs of many loosely-coupled, single-concerned microservices. Microservices have several advantages, including speeding up development and deployment, allowing specialization of the software infrastructure, and helping with debugging and error isolation. At the same time they introduce several hardware and software challenges. Given that most of the performance and efficiency implications of microservices happen at scales larger than what is available outside production deployments, studying such effects requires designing the right simulation infrastructures. We present uqSim, a scalable and validated queueing network simulator designed specifically for interactive microservices. uqSim provides detailed intra- and inter-microservice models that allow it to faithfully reproduce the behavior of complex, many-tier applications. uqSim is also modular, allowing reuse of individual models across microservices and end-to-end applications. We have validated uqSim both against simple and more complex microservices graphs, and have shown that it accurately captures performance in terms of throughput and tail latency. Finally, we use uqSim to model the tail at scale effects of request fanout, and the performance impact of power management in latency-sensitive microservices.

[1]  Christina Delimitrou,et al.  Workload characterization of interactive cloud services on big and small server platforms , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[2]  Junjie Wu,et al.  BigHouse: A simulation infrastructure for data center systems , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[3]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[4]  Christina Delimitrou,et al.  Tarcil: reconciling scheduling speed and quality in large shared clusters , 2015, SoCC.

[5]  Luiz André Barroso,et al.  Warehouse-Scale Computing: Entering the Teenage Decade , 2011, SIGARCH Comput. Archit. News.

[6]  Christina Delimitrou,et al.  HCloud: Resource-Efficient Provisioning in Shared Cloud Systems , 2016, ASPLOS.

[7]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[8]  Shengli Zhou,et al.  Aqua-Sim: An NS-2 based simulator for underwater sensor networks , 2009, OCEANS 2009.

[9]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[10]  Jun Sun,et al.  Poster: Benchmarking Microservice Systems for Software Engineering Research , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[11]  M. Gerla,et al.  GloMoSim: a library for parallel simulation of large-scale wireless networks , 1998, Proceedings. Twelfth Workshop on Parallel and Distributed Simulation PADS '98 (Cat. No.98TB100233).

[12]  Ronald G. Dreslinski,et al.  Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[13]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[14]  Mor Harchol-Balter,et al.  Performance Modeling and Design of Computer Systems: Queueing Theory in Action , 2013 .

[15]  Ripal Nathuji,et al.  Exploiting Platform Heterogeneity for Power Efficient Data Centers , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[16]  Takuya Nakaike,et al.  Workload characterization for microservices , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Thomas R. Henderson,et al.  Network Simulations with the ns-3 Simulator , 2008 .

[18]  Christina Delimitrou,et al.  The Architectural Implications of Cloud Microservices , 2018, IEEE Computer Architecture Letters.

[19]  Yuan He,et al.  An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[20]  Lingjia Tang,et al.  Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.

[21]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[22]  Teerawat Issariyakul,et al.  Introduction to Network Simulator NS2 , 2008 .

[23]  Christina Delimitrou,et al.  Quality-of-Service-Aware Scheduling in Heterogeneous Data centers with Paragon , 2014, IEEE Micro.

[24]  Thomas F. Wenisch,et al.  μ Suite: A Benchmark Suite for Microservices , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[25]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[26]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Christina Delimitrou,et al.  QoS-Aware scheduling in heterogeneous datacenters with paragon , 2013, TOCS.

[28]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[29]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[30]  Christina Delimitrou,et al.  Bolt: I Know What You Did Last Summer... In The Cloud , 2017, ASPLOS.

[31]  Christina Delimitrou,et al.  Seer : Leveraging Big Data to Navigate The Complexity of Cloud Debugging , 2018 .

[32]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[33]  Yuan He,et al.  Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices , 2019, ASPLOS.

[34]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[35]  Leonard Kleinrock,et al.  Queueing Systems: Volume I-Theory , 1975 .

[36]  Christina Delimitrou,et al.  QoS-Aware Admission Control in Heterogeneous Datacenters , 2013, ICAC.