The nanoPU: A Nanosecond Network Stack for Datacenters
暂无分享,去创建一个
Nick McKeown | Changhoon Kim | Muhammad Shahbaz | Stephen Ibanez | Theo Jepsen | Alex Mallery | Serhat Arslan | N. McKeown | Changhoon Kim | M. Shahbaz | Theo Jepsen | S. Arslan | Stephen Ibanez | Alex Mallery
[1] Adam M. Izraelevitz,et al. The Rocket Chip Generator , 2016 .
[2] Christopher Torng,et al. The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric: Fast Architectures and Design Methodologies for Fast Chips , 2018, IEEE Micro.
[3] Andrew A. Chien,et al. The J-Machine: A Fine Grain Concurrent Computer , 1989 .
[4] Andrew W. Moore,et al. Understanding PCIe performance for end host networking , 2018, SIGCOMM.
[5] Nick McKeown,et al. The Case for a Network Fast Path to the CPU , 2019, HotNets.
[6] Edouard Bugnion,et al. R2P2: Making RPCs first-class datacenter citizens , 2019, USENIX ATC.
[7] Song Jiang,et al. Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.
[8] George Varghese,et al. Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.
[9] Robert Muir,et al. Apache Lucene 4 , 2012, OSIR@SIGIR.
[10] John K. Ousterhout,et al. In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.
[11] N. McKeown,et al. Event-Driven Packet Processing , 2019, HotNets.
[12] Babak Falsafi,et al. Scale-out NUMA , 2014, ASPLOS.
[13] Michael Kaminsky,et al. Using RDMA efficiently for key-value services , 2014, SIGCOMM.
[14] David Sidler,et al. StRoM: smart remote memory , 2020, EuroSys.
[15] Hari Balakrishnan,et al. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads , 2019, NSDI.
[16] David G. Andersen,et al. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.
[17] David Walker,et al. Enabling Programmable Transport Protocols in High-Speed NICs , 2020, NSDI.
[18] Babak Falsafi,et al. The NEBULA RPC-Optimized Architecture , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[19] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[20] Michael Kaminsky,et al. Datacenter RPCs can be General and Fast , 2018, NSDI.
[21] A Thunk to Remember: make -j1000 (and other jobs) on functions-as-a-service infrastructure , 2017 .
[22] Karan Gupta,et al. Offloading distributed applications onto smartNICs using iPipe , 2019, SIGCOMM.
[23] Amin Vahdat,et al. Snap: a microkernel approach to host networking , 2019, SOSP.
[24] Carsten Binnig,et al. The End of Slow Networks: It's Time for a Redesign , 2015, Proc. VLDB Endow..
[25] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[26] Thomas F. Wenisch,et al. μ Suite: A Benchmark Suite for Microservices , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[27] Babak Falsafi,et al. RPCValet: NI-Driven Tail-Aware Balancing of µs-Scale RPCs , 2019, ASPLOS.
[28] Ming Zhang,et al. Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..
[29] David A. Patterson,et al. Attack of the killer microseconds , 2017, Commun. ACM.
[30] John Wawrzynek,et al. Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.
[31] Babak Falsafi,et al. Optimus Prime: Accelerating Data Transformation in Servers , 2020, ASPLOS.
[32] Anirudh Sivaraman,et al. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.
[33] Anirudh Sivaraman,et al. In-band Network Telemetry via Programmable Dataplanes , 2015 .
[34] Haibo Chen,et al. Fast and general distributed transactions using RDMA and HTM , 2016, EuroSys.
[35] R. E. Kessler,et al. Cray T3D: a new dimension for Cray Research , 1993, Digest of Papers. Compcon Spring.
[36] Katerina J. Argyraki,et al. ResQ: Enabling SLOs in Network Function Virtualization , 2018, NSDI.
[37] Christoforos E. Kozyrakis,et al. Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency , 2019, NSDI.
[38] Edouard Bugnion,et al. ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks , 2017, SOSP.
[39] J. Ramanujam,et al. A Massively Parallel Distributed N-body Application Implemented with HPX , 2016, 2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA).
[40] Rajit Manohar,et al. SNAP: a Sensor-Network Asynchronous Processor , 2003, Ninth International Symposium on Asynchronous Circuits and Systems, 2003. Proceedings..
[41] Ren Wang,et al. HALO: Accelerating Flow Classification for Scalable Packet Processing in NFV , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[42] John K. Ousterhout,et al. Homa: a receiver-driven low-latency transport protocol using network priorities , 2018, SIGCOMM.
[43] Aditya Chopra,et al. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[44] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[45] John K. Ousterhout,et al. MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds , 2021, NSDI.
[46] Mark Handley,et al. Re-architecting datacenter networks and stacks for low latency and high performance , 2017, SIGCOMM.
[47] Hyeontaek Lim,et al. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.
[48] Shin-Yeh Tsai,et al. Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores , 2020, USENIX ATC.
[49] Dejan Kostic,et al. Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks , 2020, USENIX ATC.