Rusty: Runtime Interference-Aware Predictive Monitoring for Modern Multi-Tenant Systems

Modern micro-service and container-based cloud-native applications have leveraged multi-tenancy as a first class system design concern. The increasing number of co-located services/workloads into server facilities stresses resource availability and system capability in an unconventional and unpredictable manner. To efficiently manage resources in such dynamic environments, run-time observability and forecasting are required to capture workload sensitivities under differing interference effects, according to applied co-location scenarios. While several research efforts have emerged on interference-aware performance modelling, they are usually applied at a very coarse-grained manner e.g., estimating the overall performance degradation of an application, thus failing to effectively quantify, predict or provide educated insights on the impact of continuous runtime interference on per-resource allocations. In this paper, we present Rusty, a predictive monitoring system that leverages the power of Long Short-Term Memory networks to enable fast and accurate runtime forecasting of key performance metrics and resource stresses of cloud-native applications under interference. We evaluate Rusty under a diverse set of interference scenarios for a plethora of representative cloud workloads, showing that Rusty i) achieves extremely high prediction accuracy, average <inline-formula><tex-math notation="LaTeX">$R^2$</tex-math><alternatives><mml:math><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="masouros-ieq1-3013948.gif"/></alternatives></inline-formula> value of 0.98, ii) enables very deep prediction horizons retaining high accuracy, e.g., <inline-formula><tex-math notation="LaTeX">$R^2$</tex-math><alternatives><mml:math><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="masouros-ieq2-3013948.gif"/></alternatives></inline-formula> of around 0.99 for a horizon of 1 sec ahead and around 0.94 for an horizon of 5 sec ahead, while iii) satisfying, at the same time, the strict latency constraints required to make Rusty practical for continuous predictive monitoring at runtime.

[1]  Comparing Program Phase Detection Techniques , 2003, MICRO.

[2]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[3]  Bin Sun,et al.  CounterMiner: Mining Big Performance Data from Hardware Counters , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bowen Zhou,et al.  Pythia: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads , 2018, Middleware.

[7]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[8]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[9]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[10]  Gbadebo Ayoade,et al.  A Survey on Hypervisor-Based Monitoring , 2015, ACM Comput. Surv..

[11]  Ronald G. Dreslinski,et al.  Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.

[12]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[13]  Avi Mendelson,et al.  Deep-dive analysis of the data analytics workload in CloudSuite , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[14]  Eric A. Brewer,et al.  Borg, Omega, and Kubernetes , 2016, ACM Queue.

[15]  Yuan He,et al.  Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices , 2019, ASPLOS.

[16]  Eduard Ayguadé,et al.  Decomposable and responsive power models for multicore processors using performance counters , 2010, ICS '10.

[17]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[18]  Margaret Martonosi,et al.  Phase characterization for power: evaluating control-flow-based and event-counter-based techniques , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[19]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[20]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[21]  Xiao Yu,et al.  CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs , 2016, ASPLOS.

[22]  Chunjie Luo,et al.  Characterizing data analysis workloads in data centers , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Yogesh D. Barve,et al.  FECBench: A Holistic Interference-aware Approach for Application Performance Modeling , 2019, 2019 IEEE International Conference on Cloud Engineering (IC2E).

[25]  Li Shen,et al.  PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[27]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Wentong Cai,et al.  GAugur: Quantifying Performance Interference of Colocated Games for Improving Resource Utilization in Cloud Gaming , 2019, HPDC.

[29]  Christoforos E. Kozyrakis,et al.  Learning Memory Access Patterns , 2018, ICML.

[30]  Sriram Sankar,et al.  Server Engineering Insights for Large-Scale Online Services , 2010, IEEE Micro.

[31]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[32]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[33]  Nam Sung Kim,et al.  SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[34]  Stijn Eyerman,et al.  Per-thread cycle accounting in multicore processors , 2013, TACO.

[35]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[36]  Jing Guo,et al.  Who Limits the Resource Efficiency of My Datacenter: An Analysis of Alibaba Datacenter Traces , 2019, 2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS).

[37]  Christoforos E. Kozyrakis,et al.  AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[38]  Mattan Erez,et al.  Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems , 2016, ASPLOS.

[39]  Ravi Iyer,et al.  Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[40]  Tipp Moseley,et al.  Measuring interference between live datacenter applications , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[41]  Christina Delimitrou,et al.  PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[42]  Bin Li,et al.  Dynamo: Facebook's Data Center-Wide Power Management System , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[43]  Simon Fraser User-level scheduling on NUMA multicore systems under Linux , 2011 .

[44]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[45]  Lingjia Tang,et al.  Enabling fair pricing on high performance computer systems with node sharing , 2014, HiPC 2014.

[46]  Babak Falsafi,et al.  Proactive instruction fetch , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[47]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[48]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[49]  Christina Delimitrou,et al.  iBench: Quantifying interference for datacenter applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[50]  Onur Mutlu,et al.  The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[51]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[52]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[53]  Yiorgos Makris,et al.  Workload characterization and prediction: A pathway to reliable multi-core systems , 2015, 2015 IEEE 21st International On-Line Testing Symposium (IOLTS).

[54]  Lingjia Tang,et al.  GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks , 2019, EuroSys.

[55]  Sherief Reda,et al.  Pack & Cap: Adaptive DVFS and thread packing under power caps , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[56]  Henry Hoffmann,et al.  ESP: A Machine Learning Approach to Predicting Application Interference , 2017, 2017 IEEE International Conference on Autonomic Computing (ICAC).

[57]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[58]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[59]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[60]  Cristinel Ababei,et al.  Investigation of LSTM based prediction for dynamic energy management in chip multiprocessors , 2017, 2017 Eighth International Green and Sustainable Computing Conference (IGSC).

[61]  Martin Schulz,et al.  Enabling fair pricing on high performance computer systems with node sharing , 2014, Sci. Program..

[62]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[63]  Feifei Li,et al.  ATOM: Efficient Tracking, Monitoring, and Orchestration of Cloud Resources , 2017, IEEE Transactions on Parallel and Distributed Systems.

[64]  Noel De Palma,et al.  Online metrics prediction in monitoring systems , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).