Distributed and Optimal RDMA Resource Scheduling in Shared Data Center Networks

Remote Direct Memory Access (RDMA) suffers from unfairness issues and performance degradation when multiple applications share RDMA network resources. Hence, an efficient resource scheduling mechanism is urged to optimally allocates RDMA resources among applications. However, traditional Network Utility Maximization (NUM) based solutions are inadequate for RDMA due to three challenges: 1) The standard NUM-oriented algorithm cannot deal with coupling variables introduced by multiple dependent RDMA operations; 2) The stringent constraint of RDMA on-board resources complicates the standard NUM by bringing extra optimization dimensions; 3) Naively applying traditional algorithms for NUM suffers from scalability and convergence issues in solving a large-scale RDMA resource scheduling problem.

[1]  Jianwei Huang,et al.  Mechanism Design for Network Utility Maximization with Private Constraint Information , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[2]  Mosharaf Chowdhury,et al.  Distributed Lock Management with RDMA: Decentralization without Starvation , 2018, SIGMOD Conference.

[3]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[4]  Wenzhong Li,et al.  Toward Effective and Fair RDMA Resource Sharing , 2018, APNet '18.

[5]  Hao Che,et al.  Non-concave network utility maximization: A distributed optimization approach , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[6]  Baochun Li,et al.  An Alternating Direction Method Approach to Cloud Traffic Management , 2014 .

[7]  Youyou Lu,et al.  Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing , 2019, EuroSys.

[8]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[9]  Jean C. Walrand,et al.  Fair end-to-end window-based congestion control , 2000, TNET.

[10]  Srinivasan Seshan,et al.  FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds , 2019, NSDI.

[11]  Panayotis Mertikopoulos,et al.  Large-Scale Network Utility Maximization: Countering Exponential Growth with Exponentiated Gradients , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[12]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[13]  A. Robert Calderbank,et al.  Layering as Optimization Decomposition: A Mathematical Theory of Network Architectures , 2007, Proceedings of the IEEE.

[14]  Yiying Zhang,et al.  LITE Kernel RDMA Support for Datacenter Applications , 2017, SOSP.

[15]  Baochun Li,et al.  Temperature Aware Workload Managementin Geo-Distributed Data Centers , 2013, IEEE Transactions on Parallel and Distributed Systems.

[16]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[17]  S. Ramakrishnan,et al.  Completely Uncoupled Algorithms for Network Utility Maximization , 2019, IEEE/ACM Transactions on Networking.

[18]  Hai Jin,et al.  Fair Network Bandwidth Allocation in IaaS Datacenters via a Cooperative Game Approach , 2016, IEEE/ACM Transactions on Networking.

[19]  Xu Li,et al.  Min Flow Rate Maximization for Software Defined Radio Access Networks , 2013, IEEE Journal on Selected Areas in Communications.