DRMaestro: orchestrating disaggregated resources on virtualized data-centers

Modern applications demand resources at an unprecedented level. In this sense, data-centers are required to scale efficiently to cope with such demand. Resource disaggregation has the potential to improve resource-efficiency by allowing the deployment of workloads in more flexible ways. Therefore, the industry is shifting towards disaggregated architectures, which enables new ways to structure hardware resources in data centers. However, determining the best performing resource provisioning is a complicated task. The optimality of resource allocation in a disaggregated data center depends on its topology and the workload collocation. This paper presents DRMaestro , a framework to orchestrate disaggregated resources transparently from the applications. DRMaestro uses a novel flow-network model to determine the optimal placement in multiple phases while employing best-efforts on preventing workload performance interference. We first evaluate the impact of disaggregation regarding the additional network requirements under higher network load. The results show that for some applications the impact is minimal, but other ones can suffer up to 80% slowdown in the data transfer part. After that, we evaluate DRMaestro via a real prototype on Kubernetes and a trace-driven simulation. The results show that DRMaestro can reduce the total job makespan with a speedup of up to ≈1.20x and decrease the QoS violation up to ≈2.64x comparing with another orchestrator that does not support resource disaggregation.

[1]  Andrew V. Goldberg,et al.  Finding Minimum-Cost Circulations by Successive Approximation , 1990, Math. Oper. Res..

[2]  Carlos Reaño,et al.  A Comparative Performance Analysis of Remote GPU Virtualization over Three Generations of GPUs , 2017, 2017 46th International Conference on Parallel Processing Workshops (ICPPW).

[3]  Kimberly Keeton,et al.  The Machine: An Architecture for Memory-centric Computing , 2015, ROSS@HPDC.

[4]  M. Klein A Primal Method for Minimal Cost Flows with Applications to the Assignment and Transportation Problems , 1966 .

[5]  Hui He,et al.  Network-aware virtual machine migration in an overcommitted cloud , 2017, Future Gener. Comput. Syst..

[6]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[7]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[8]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[9]  Sergio Iserte,et al.  SLURM Support for Remote GPU Virtualization: Implementation and Performance Study , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[10]  Björn Franke,et al.  Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining , 2010, SCOPES.

[11]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[12]  Victor I. Chang,et al.  Composable architecture for rack scale big data computing , 2017, Future Gener. Comput. Syst..

[13]  Mohsine Eleuldj,et al.  OpenStack: Toward an Open-source Solution for Cloud Computing , 2012 .

[14]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[15]  Vishakha Gupta,et al.  Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies , 2011, VTDC '11.

[16]  Yang Hu,et al.  Towards "Full Containerization" in Containerized Network Function Virtualization , 2017, ASPLOS.

[17]  Horst Bunke,et al.  Graph matching and similarity , 2000 .

[18]  Wanjiun Liao,et al.  Capacity Optimization for Resource Pooling in Virtualized Data Centers with Composable Systems , 2018, IEEE Transactions on Parallel and Distributed Systems.

[19]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[20]  Kostas Katrinis,et al.  Rack-scale disaggregated cloud data centers: The dReDBox project vision , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[22]  Wu-chun Feng,et al.  pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[23]  Salvatore Spadaro,et al.  On the benefits of resource disaggregation for virtual data centre provisioning in optical data centres , 2017, Comput. Commun..

[24]  Andrew V. Goldberg,et al.  An efficient implementation of a scaling minimum-cost flow algorithm , 1993, IPCO.

[25]  Yu-Wei Chang,et al.  GridCuda: A Grid-Enabled CUDA Programming Toolkit , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.

[26]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[27]  Jordi Torres,et al.  Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement , 2008, Middleware.

[28]  V G Andrew,et al.  AN EFFICIENT IMPLEMENTATION OF A SCALING MINIMUM-COST FLOW ALGORITHM , 1997 .

[29]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[30]  Diksha Verma,et al.  Quincy: Fair Scheduling for Distributed Computing Clusters , 2014 .

[31]  Sergio Iserte,et al.  Enabling GPU Virtualization in Cloud Environments , 2016, CLOSER.

[32]  Robert N. M. Watson,et al.  Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.

[33]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[34]  Tetsu Narumi,et al.  DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[35]  Jason Taylor,et al.  Facebook's data center infrastructure: Open compute, disaggregated rack, and beyond , 2015, 2015 Optical Fiber Communications Conference and Exhibition (OFC).

[36]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[37]  Jane Zundel MATCHING THEORY , 2011 .

[38]  Seetharami R. Seelam,et al.  Topology-Aware GPU Scheduling for Learning Workloads in Cloud Environments , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[39]  Lieven Eeckhout,et al.  Performance prediction based on inherent program similarity , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[40]  Vasileios Pappas,et al.  Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement , 2010, 2010 Proceedings IEEE INFOCOM.

[41]  Scott Shenker,et al.  Network Requirements for Resource Disaggregation , 2016, OSDI.

[42]  Wu-chun Feng,et al.  VOCL: An optimized environment for transparent virtualization of graphics processing units , 2012, 2012 Innovative Parallel Computing (InPar).

[43]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..