Rusty: Runtime System Predictability Leveraging LSTM Neural Networks

Modern cloud scale data-centers are adopting workload co-location as an effective mechanism for improving resource utilization. However, workload co-location is stressing resource availability in unconventional and unpredictable manner. Efficient resource management requires continuous and ideally predictive runtime knowledge of system metrics, sensitive both to workload demands, e.g., CPU, memory etc., as well as interference effects induced by co-location. In this paper, we present Rusty, a framework able to address the aforementioned challenges by leveraging the power of Long Short-Term Memory networks to forecast at runtime, performance metrics of applications executed on systems under interference. We evaluate Rusty under a diverse set of interference scenarios for a plethora of cloud workloads, showing that Rusty achieves extremely high prediction accuracy, up to 0.99 in terms of $R^2$R2 value, satisfying at the same time the strict latency constraints to be usable at runtime.

[1]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Ravi Iyer,et al.  Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[4]  Christoforos E. Kozyrakis,et al.  Learning Memory Access Patterns , 2018, ICML.

[5]  Henry Hoffmann,et al.  ESP: A Machine Learning Approach to Predicting Application Interference , 2017, 2017 IEEE International Conference on Autonomic Computing (ICAC).

[6]  David A. Wood,et al.  Reuse-based online models for caches , 2013, SIGMETRICS '13.

[7]  Christopher Stewart,et al.  Model-driven computational sprinting , 2018, EuroSys.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Christina Delimitrou,et al.  iBench: Quantifying interference for datacenter applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[10]  Rajesh Gupta,et al.  Evaluating the effectiveness of model-based power characterization , 2011 .

[11]  Timothy Roscoe,et al.  So many performance events, so little time , 2016, APSys.

[12]  Onur Mutlu,et al.  Predictable Performance and Fairness Through Accurate Slowdown Estimation in Shared Main Memory Systems , 2018, ArXiv.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[15]  Yingwei Luo,et al.  DCAPS: dynamic cache allocation with partial sharing , 2018, EuroSys.

[16]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[17]  Cristinel Ababei,et al.  Investigation of LSTM based prediction for dynamic energy management in chip multiprocessors , 2017, 2017 Eighth International Green and Sustainable Computing Conference (IGSC).

[18]  Noel De Palma,et al.  Online metrics prediction in monitoring systems , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[19]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[20]  Eduard Ayguadé,et al.  Decomposable and responsive power models for multicore processors using performance counters , 2010, ICS '10.

[21]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).