Packet Classifier LaKe FPGA-NIC Memcached Traffic Normal Traffic Kernel Space User Space memcached Cache Hit Client Network Interface DMA DRAM BRAM Client Client Client Cache Miss

Programmable network hardware can run services traditionally deployed on servers, resulting in orders-of-magnitude improvements in performance. Yet, despite these performance improvements, network operators remain skeptical of in-network computing. The conventional wisdom is that the operational costs from increased power consumption outweigh any performance benefits. Unless in-network computing can justify its costs, it will be disregarded as yet another academic exercise. In this paper, we challenge that assumption, by providing a detailed power analysis of several in-network computing use cases. Our experiments show that in-network computing can be extremely power-efficient. In fact, for a single watt, a software system on commodity CPU can be improved by a factor of ×100 using an FPGA, and a factor of ×1000 utilizing ASIC implementations. However, this efficiency depends on the system load. To address changing workloads, we propose in-network computing on demand, where services can be dynamically moved between servers and the network. By shifting the placement of services on-demand, data centers can optimize for both performance and power efficiency. CCS Concepts • Networks In-network processing; • Hardware Power estimation and optimization. ACM Reference Format: Yuta Tokusashi, Huynh Tu Dang, Fernando Pedone, Robert Soulé, and Noa Zilberman. 2019. The Case For In-Network Computing On Demand. In Proceedings of Fourteenth EuroSys Conference 2019 (EuroSys ’19). ACM, New York, NY, USA, 16 pages. https: //doi.org/10.1145/3302424.3303979 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. EuroSys ’19, March 25–28, 2019, Dresden, Germany © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-6281-8/19/03. . . $15.00 https://doi.org/10.1145/3302424.3303979

[1]  Ben Gelernter Help design challenges in network computing , 1998, SIGDOC '98.

[2]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[3]  Brian E. Carpenter,et al.  Middleboxes: Taxonomy and Issues , 2002, RFC.

[4]  Ricardo Bianchini,et al.  Power and energy management for server systems , 2004, Computer.

[5]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[6]  Satnam Singh,et al.  Kiwi: Synthesis of FPGA Circuits from Parallel Programs , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[7]  Andy Hopper,et al.  Computing for the future of the planet , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[8]  Katerina J. Argyraki,et al.  RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.

[9]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[10]  Leslie Lamport,et al.  Vertical paxos and primary-backup replication , 2009, PODC '09.

[11]  Leslie Lamport,et al.  Reconfiguring a state machine , 2010, SIGA.

[12]  Rodney S. Tucker,et al.  Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport , 2011, Proceedings of the IEEE.

[13]  Jaideep Chandrashekar,et al.  Building a Power-Proportional Software Router , 2012, USENIX Annual Technical Conference.

[14]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[15]  Alexander L. Wolf,et al.  NaaS: Network-as-a-Service in the Cloud , 2012, Hot-ICE.

[16]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[17]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[18]  Andrew W. Moore,et al.  NetFPGA SUME: Toward 100 Gbps as Research Commodity , 2014, IEEE Micro.

[19]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[20]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[21]  Myungsun Kim,et al.  Utilization-aware load balancing for the energy efficient operation of the big.LITTLE processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[22]  Philippe Owezarski,et al.  OSNT: open source network tester , 2014, IEEE Network.

[23]  Hein Meling,et al.  Asynchronous Reconfiguration for Paxos State Machines , 2014, ICDCN.

[24]  Fernando Pedone,et al.  NetPaxos: consensus at network speed , 2015, SOSR.

[25]  Jialin Li,et al.  Designing Distributed Systems Using Approximate Synchrony in Data Center Networks , 2015, NSDI.

[26]  Torsten Hoefler,et al.  DARE: High-Performance State Machine Replication on RDMA Networks , 2015, HPDC.

[27]  Klaus Wehrle,et al.  Santa: Faster Packet Delivery for Commonly Wished Replies , 2015, Comput. Commun. Rev..

[28]  Gustavo Alonso,et al.  Scalable 10Gbps TCP/IP Stack Architecture for Reconfigurable Hardware , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[29]  Anirudh Sivaraman,et al.  In-band Network Telemetry via Programmable Dataplanes , 2015 .

[30]  Beng Chin Ooi,et al.  A Performance Study of Big Data on Small Nodes , 2015, Proc. VLDB Endow..

[31]  Eric S. Chung,et al.  A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[32]  Mark Handley,et al.  Network stack specialization for performance , 2015, SIGCOMM 2015.

[33]  Fernando Pedone,et al.  Paxos Made Switch-y , 2015, CCRV.

[34]  Gustavo Alonso,et al.  Consensus in a Box: Inexpensive Coordination in Hardware , 2016, NSDI.

[35]  Jialin Li,et al.  Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering , 2016, OSDI.

[36]  Daniel Wong,et al.  Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[37]  Bin Li,et al.  Dynamo: Facebook's Data Center-Wide Power Management System , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[38]  Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures , 2014, IEEE/ACM Transactions on Networking.

[39]  Marcin Wójcik,et al.  Where Has My Time Gone? , 2017, PAM.

[40]  Gustavo Alonso,et al.  Caribou: Intelligent Distributed Storage , 2017, Proc. VLDB Endow..

[41]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[42]  Panos Kalnis,et al.  In-Network Computation is a Dumb Idea Whose Time Has Come , 2017, HotNets.

[43]  Enhong Chen,et al.  KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC , 2017, SOSP.

[44]  Huynh Tu Dang,et al.  P4FPGA: A Rapid Prototyping Framework for P4 , 2017, SOSR.

[45]  Takuya Akiba,et al.  ChainerMN: Scalable Distributed Deep Learning Framework , 2017, ArXiv.

[46]  Robert Soulé,et al.  Emu: Rapid Prototyping of Networking Services , 2017, USENIX Annual Technical Conference.

[47]  Diana Andreea Popescu,et al.  Characterizing the impact of network latency on cloud-based applications’ performance , 2017 .

[48]  Hiroki Matsutani,et al.  Multilevel NoSQL Cache Combining In-NIC and In-Kernel Approaches , 2017, IEEE Micro.

[49]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[50]  Hiroki Matsutani,et al.  LaKe: The Power of In-Network Computing , 2018, 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[51]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[52]  Chengchen Hu,et al.  P4Visor: lightweight virtualization and composition primitives for building and testing modular programs , 2018, CoNEXT.

[53]  Xiaozhou Li,et al.  NetChain: Scale-Free Sub-RTT Coordination , 2018, NSDI.

[54]  Robert Soulé,et al.  Life in the Fast Lane: A Line-Rate Linear Road , 2018, SOSR.

[55]  Noa Zilberman,et al.  Stardust: Divide and Conquer in the Data Center Network , 2019, NSDI.

[56]  Fernando Pedone,et al.  P4xos: Consensus as a Network Service , 2020, IEEE/ACM Transactions on Networking.