The Case For In-Network Computing On Demand

Programmable network hardware can run services traditionally deployed on servers, resulting in orders-of-magnitude improvements in performance. Yet, despite these performance improvements, network operators remain skeptical of in-network computing. The conventional wisdom is that the operational costs from increased power consumption outweigh any performance benefits. Unless in-network computing can justify its costs, it will be disregarded as yet another academic exercise. In this paper, we challenge that assumption, by providing a detailed power analysis of several in-network computing use cases. Our experiments show that in-network computing can be extremely power-efficient. In fact, for a single watt, a software system on commodity CPU can be improved by a factor of x100 using an FPGA, and a factor of x1000 utilizing ASIC implementations. However, this efficiency depends on the system load. To address changing workloads, we propose in-network computing on demand, where services can be dynamically moved between servers and the network. By shifting the placement of services on-demand, data centers can optimize for both performance and power efficiency.

[1]  Diana Andreea Popescu,et al.  Characterizing the impact of network latency on cloud-based applications’ performance , 2017 .

[2]  Fernando Pedone,et al.  NetPaxos: consensus at network speed , 2015, SOSR.

[3]  Daniel Wong,et al.  Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[4]  Panos Kalnis,et al.  In-Network Computation is a Dumb Idea Whose Time Has Come , 2017, HotNets.

[5]  Leslie Lamport,et al.  Reconfiguring a state machine , 2010, SIGA.

[6]  Xiaozhou Li,et al.  NetChain: Scale-Free Sub-RTT Coordination , 2018, NSDI.

[7]  Fernando Pedone,et al.  P4xos: Consensus as a Network Service , 2020, IEEE/ACM Transactions on Networking.

[8]  Marcin Wójcik,et al.  Where Has My Time Gone? , 2017, PAM.

[9]  Sotiris Ioannidis,et al.  Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures , 2017, IEEE/ACM Transactions on Networking.

[10]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[11]  Andrew W. Moore,et al.  NetFPGA SUME: Toward 100 Gbps as Research Commodity , 2014, IEEE Micro.

[12]  Klaus Wehrle,et al.  Santa: Faster Packet Delivery for Commonly Wished Replies , 2015, Comput. Commun. Rev..

[13]  Torsten Hoefler,et al.  DARE: High-Performance State Machine Replication on RDMA Networks , 2015, HPDC.

[14]  Chengchen Hu,et al.  P4Visor: lightweight virtualization and composition primitives for building and testing modular programs , 2018, CoNEXT.

[15]  Myungsun Kim,et al.  Utilization-aware load balancing for the energy efficient operation of the big.LITTLE processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Rodney S. Tucker,et al.  Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport , 2011, Proceedings of the IEEE.

[17]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[18]  Beng Chin Ooi,et al.  A Performance Study of Big Data on Small Nodes , 2015, Proc. VLDB Endow..

[19]  Hiroki Matsutani,et al.  LaKe: The Power of In-Network Computing , 2018, 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[20]  Jialin Li,et al.  Designing Distributed Systems Using Approximate Synchrony in Data Center Networks , 2015, NSDI.

[21]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[22]  Satnam Singh,et al.  Kiwi: Synthesis of FPGA Circuits from Parallel Programs , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[23]  Hiroki Matsutani,et al.  Multilevel NoSQL Cache Combining In-NIC and In-Kernel Approaches , 2017, IEEE Micro.

[24]  Fernando Pedone,et al.  Paxos Made Switch-y , 2015, CCRV.

[25]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[26]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[27]  Ricardo Bianchini,et al.  Power and energy management for server systems , 2004, Computer.

[28]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[29]  Gustavo Alonso,et al.  Consensus in a Box: Inexpensive Coordination in Hardware , 2016, NSDI.

[30]  Takuya Akiba,et al.  ChainerMN: Scalable Distributed Deep Learning Framework , 2017, ArXiv.

[31]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[32]  Philippe Owezarski,et al.  OSNT: open source network tester , 2014, IEEE Network.

[33]  Sungryoul Lee,et al.  Kargus: a highly-scalable software-based intrusion detection system , 2012, CCS.

[34]  Mark Handley,et al.  Network stack specialization for performance , 2013, HotNets.

[35]  Noa Zilberman,et al.  Stardust: Divide and Conquer in the Data Center Network , 2019, NSDI.

[36]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[37]  Leslie Lamport,et al.  Vertical paxos and primary-backup replication , 2009, PODC '09.

[38]  Bin Li,et al.  Dynamo: Facebook's Data Center-Wide Power Management System , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[39]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[40]  Ben Gelernter Help design challenges in network computing , 1998, SIGDOC '98.

[41]  Jaideep Chandrashekar,et al.  Building a Power-Proportional Software Router , 2012, USENIX Annual Technical Conference.

[42]  Katerina J. Argyraki,et al.  RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.

[43]  Jialin Li,et al.  Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering , 2016, OSDI.

[44]  Robert Soulé,et al.  Life in the Fast Lane: A Line-Rate Linear Road , 2018, SOSR.

[45]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[46]  Alexander L. Wolf,et al.  NaaS: Network-as-a-Service in the Cloud , 2012, Hot-ICE.

[47]  Gustavo Alonso,et al.  Caribou: Intelligent Distributed Storage , 2017, Proc. VLDB Endow..

[48]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[49]  Andy Hopper,et al.  Computing for the future of the planet , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[50]  MytkowiczTodd,et al.  Producing wrong data without doing anything obviously wrong , 2009 .

[51]  Anirudh Sivaraman,et al.  In-band Network Telemetry via Programmable Dataplanes , 2015 .

[52]  Gustavo Alonso,et al.  Scalable 10Gbps TCP/IP Stack Architecture for Reconfigurable Hardware , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[53]  Hein Meling,et al.  Asynchronous Reconfiguration for Paxos State Machines , 2014, ICDCN.

[54]  Brian E. Carpenter,et al.  Middleboxes: Taxonomy and Issues , 2002, RFC.

[55]  Huynh Tu Dang,et al.  P4FPGA: A Rapid Prototyping Framework for P4 , 2017, SOSR.

[56]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[57]  Enhong Chen,et al.  KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC , 2017, SOSP.