sfc2cpu: Operating a Service Function Chain Platform with Neural Combinatorial Optimization

Service Function Chaining realized with a microservice based architecture results in an increased number of computationally cheap Virtual Network Functions (VNFs). Pinning cheap VNFs to dedicated CPU cores can waste resources since not every VNF fully utilizes its core. Thus, cheap VNFs should share CPU cores to improve resource utilization. sfc2cpu learns efficient VNF to core assignments that increase throughput and reduce latency compared to three baseline algorithms. To optimize VNF assignments, sfc2cpu uses game theory combined with Neural Combinatorial optimization in a novel way. Measurements in a real hardware testbed show that sfc2cpu increases throughput by up to 36% and reduces latency by up to 59% compared to Round Robin. We show that sfc2cpu can be incrementally deployed and easily integrated into existing infrastructures.

[1]  Peng Zheng,et al.  NFV Performance Profiling on Multi-core Servers , 2020, 2020 IFIP Networking Conference (Networking).

[2]  Robert Gibbons,et al.  A primer in game theory , 1992 .

[3]  Kun Wang,et al.  Optimizing virtual machine scheduling in NUMA multicore systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[4]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[5]  Scott Shenker,et al.  E2: a framework for NFV applications , 2015, SOSP.

[6]  K. K. Ramakrishnan,et al.  OpenNetVM: A Platform for High Performance Network Service Chains , 2016, HotMiddlebox@SIGCOMM.

[7]  Wolfgang Kellerer,et al.  Towards Reducing Last-Level-Cache Interference of Co-Located Virtual Network Functions , 2019, 2019 28th International Conference on Computer Communication and Networks (ICCCN).

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Roberto Bifulco,et al.  ClickOS and the Art of Network Function Virtualization , 2014, NSDI.

[10]  Rebecca Steinert,et al.  Metron: NFV Service Chains at the True Speed of the Underlying Hardware , 2018, NSDI.

[11]  Robert C. Martin,et al.  Clean Architecture: A Craftsman's Guide to Software Structure and Design , 2017 .

[12]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[13]  Eckehard Steinbach,et al.  Edge Cloud-based Augmented Reality , 2019, 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP).

[14]  Eduard Alarcón,et al.  A machine learning-based approach for virtual network function modeling , 2018, 2018 IEEE Wireless Communications and Networking Conference Workshops (WCNCW).

[15]  Ion Stoica,et al.  Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[16]  Samy Bengio,et al.  Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[17]  Ion Stoica,et al.  Tune: A Research Platform for Distributed Model Selection and Training , 2018, ArXiv.

[18]  Muhammad Shahbaz,et al.  Elastic RSS: Co-Scheduling Packets and Cores Using Programmable NICs , 2019, APNet.

[19]  Quoc V. Le,et al.  Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.

[20]  Phuoc Tran-Gia,et al.  SDN and NFV as Enabler for the Distributed Network Cloud , 2018, Mob. Networks Appl..

[21]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[22]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[23]  Wolfgang Kellerer,et al.  Adaptable and Data-Driven Softwarized Networks: Review, Opportunities, and Challenges , 2019, Proceedings of the IEEE.

[24]  Vyas Sekar,et al.  Contention-Aware Performance Prediction For Virtualized Network Functions , 2020, SIGCOMM.

[25]  Hari Balakrishnan,et al.  Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads , 2019, NSDI.

[26]  Faqir Zarrar Yousaf,et al.  z-TORCH: An Automated NFV Orchestration and Monitoring Solution , 2018, IEEE Transactions on Network and Service Management.

[27]  Christoforos E. Kozyrakis,et al.  Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency , 2019, NSDI.

[28]  Amy Greenwald,et al.  Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods , 2017 .

[29]  Edouard Bugnion,et al.  ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks , 2017, SOSP.

[30]  Tim Roughgarden,et al.  Data-driven algorithm design , 2020, Commun. ACM.

[31]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[32]  Daniel Raumer,et al.  MoonGen: A Scriptable High-Speed Packet Generator , 2014, Internet Measurement Conference.

[33]  Miao Li,et al.  Demystifying the Performance Interference of Co-Located Virtual Network Functions , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[34]  Yongyu Wang,et al.  NUMA-aware design and mapping for pipeline network functions , 2017, 2017 4th International Conference on Systems and Informatics (ICSAI).

[35]  Quoc V. Le,et al.  A Hierarchical Model for Device Placement , 2018, ICLR.

[36]  Tao Li,et al.  Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[37]  Wei Zhang,et al.  NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains , 2017, IEEE/ACM Transactions on Networking.

[38]  Wolfgang Kellerer,et al.  Towards optimal adaptation of NFV packet processing to modern CPU memory architectures , 2017, CAN@CoNEXT.

[39]  K. K. Ramakrishnan,et al.  Flurries: Countless Fine-Grained NFs for Flexible Per-Flow Customization , 2016, CoNEXT.

[40]  Wolfgang Kellerer,et al.  GPU Accelerated Planning and Placement of Edge Clouds , 2019, 2019 International Conference on Networked Systems (NetSys).

[41]  Gerald Q. Maguire,et al.  RSS++: load and state-aware receive side scaling , 2019, CoNEXT.

[42]  Jirí Sgall,et al.  First Fit bin packing: A tight analysis , 2013, STACS.

[43]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[44]  Cong Xu,et al.  Iron: Isolating Network-based CPU in Container Environments , 2018, NSDI.

[45]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[46]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[47]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[48]  Vivien Quéma,et al.  Thread and Memory Placement on NUMA Systems: Asymmetry Matters , 2015, USENIX Annual Technical Conference.

[49]  Didier Colle,et al.  Network service chaining with optimized network function embedding supporting service decompositions , 2015, Comput. Networks.

[50]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[51]  Mohammed Samaka,et al.  A survey on service function chaining , 2016, J. Netw. Comput. Appl..

[52]  Ameet Talwalkar,et al.  Massively Parallel Hyperparameter Tuning , 2018, ArXiv.

[53]  Chen Sun,et al.  Octans: Optimal Placement of Service Function Chains in Many-Core Systems , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[54]  Paolo Valente,et al.  PSPAT: Software packet scheduling at hardware speed , 2018, Comput. Commun..