CacheCloud: Towards Speed-of-light Datacenter Communication

The network is continuing to advance unabated, with 100Gb/s Ethernet already a commercial reality, and now 400Gb/s in the standardization process. Within a single rack, inter-server latency will soon be in the range of 250ns, trending ever closer towards the fundamental propagation delay of light in a fiber. In this paper we argue that in this environment, a major performance bottleneck is DRAM latency, which has stagnated at 100ns per access. Consequently, data should be kept entirely in the CPU cache which has an order of magnitude lower latency and RAM should be considered a slower backing store. We describe the implications of designing a “speed of light” datacenter network stack that can keep up with ever increasing link speeds with the goal of keeping latency as close to the speed of light propagation time as possible.

[1]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[2]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[3]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[4]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[5]  Thomas E. Anderson,et al.  Ingress Pipeline Queues Packet Buffer DMA PipelineDMA Egress Pipeline , 2015 .

[6]  Dahlia Malkhi,et al.  CORFU: A distributed shared log , 2013, TOCS.

[7]  David A. Patterson,et al.  Attack of the killer microseconds , 2017, Commun. ACM.

[8]  Onur Mutlu,et al.  DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems , 2010 .

[9]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[10]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[11]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[12]  Bruce M. Maggs,et al.  The Internet at the Speed of Light , 2014, HotNets.

[13]  Enhong Chen,et al.  KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC , 2017, SOSP.

[14]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[15]  Laxmi N. Bhuyan,et al.  Performance Measurement of an Integrated NIC Architecture with 10GbE , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[16]  Ravi Iyer,et al.  Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17]  Onur Mutlu,et al.  Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Scott Shenker,et al.  NetBricks: Taking the V out of NFV , 2016, OSDI.

[19]  Gal Shahaf,et al.  Beyond fat-trees without antennae, mirrors, and disco-balls , 2017, SIGCOMM.

[20]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[21]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[22]  Vyas Sekar,et al.  Making middleboxes someone else's problem: network processing as a cloud service , 2012, SIGCOMM '12.

[23]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[24]  Jialin Li,et al.  Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control , 2017, SOSP.

[25]  Michael Dinitz,et al.  Xpander: Towards Optimal-Performance Datacenters , 2016, CoNEXT.

[26]  David A. Patterson,et al.  Latency Lags Bandwidth , 2005, ICCD.

[27]  David A. Patterson,et al.  A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.