Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks

Memory access is the major bottleneck in realizing multi-hundred-gigabit networks with commodity hardware, hence it is essential to make good use of cache memory that is a faster, but smaller memor ...

[1]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[2]  Xiaodong Wang,et al.  SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[3]  Jeffrey C. Mogul,et al.  Nines are Not Enough: Meaningful Metrics for Clouds , 2019, HotOS.

[4]  Andrew W. Moore,et al.  Understanding PCIe performance for end host networking , 2018, SIGCOMM.

[5]  Gerald Q. Maguire,et al.  SNF: Synthesizing high performance NFV service chains , 2016, PeerJ Prepr..

[6]  John K. Ousterhout Always measure one level deeper , 2018, Commun. ACM.

[7]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[8]  Ren Wang,et al.  HALO: Accelerating Flow Classification for Scalable Packet Processing in NFV , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[9]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[10]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[11]  Thomas E. Anderson,et al.  Ingress Pipeline Queues Packet Buffer DMA PipelineDMA Egress Pipeline , 2015 .

[12]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[13]  Orna Agmon Ben-Yehuda,et al.  Ginseng: Market-Driven LLC Allocation , 2016, USENIX Annual Technical Conference.

[14]  Herbert Bos,et al.  : Practical Cache Attacks from the Network , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[15]  Peng Zheng,et al.  A Closer Look at NFV Execution Models , 2019, APNet.

[16]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[17]  Ankit Singla,et al.  Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions , 2018, ArXiv.

[18]  Mark Silberstein,et al.  Lynx: A SmartNIC-driven Accelerator-centric Architecture for Network Servers , 2020, ASPLOS.

[19]  Andrew W. Moore,et al.  NetFPGA SUME: Toward 100 Gbps as Research Commodity , 2014, IEEE Micro.

[20]  Xiaosong Ma,et al.  KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[21]  Ram Huggahalli,et al.  Impact of Cache Coherence Protocols on the Processing of Network Traffic , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[22]  Xiang Gao,et al.  Using Direct Cache Access Combined with Integrated NIC Architecture to Accelerate Network Processing , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[23]  Laurent Mathy,et al.  Fast userspace packet processing , 2015, 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[24]  Yingwei Luo,et al.  DCAPS: dynamic cache allocation with partial sharing , 2018, EuroSys.

[25]  Katerina J. Argyraki,et al.  ResQ: Enabling SLOs in Network Function Virtualization , 2018, NSDI.

[26]  Gerald Q. Maguire,et al.  Make the Most out of Last Level Cache in Intel Processors , 2019, EuroSys.

[27]  Boris Grot,et al.  Scale-out ccNUMA: exploiting skew with strongly consistent caching , 2018, EuroSys.

[28]  Michael M. Swift,et al.  Loom: Flexible and Efficient NIC Packet Scheduling , 2019, NSDI.

[29]  Brad Calder,et al.  Reducing cache misses using hardware and software page placement , 1999, ICS '99.

[30]  Mark Rowland,et al.  The Intel® Xeon® processor E5 family architecture, power efficiency, and performance , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[31]  Yang Li,et al.  dCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service , 2018, EuroSys.

[32]  Nick McKeown,et al.  The Case for a Network Fast Path to the CPU , 2019, HotNets.

[33]  Veljko M. Milutinovic,et al.  The cache injection/cofetch architecture: initial performance evaluation , 1997, Proceedings Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[34]  Robert Ricci,et al.  Taming Performance Variability , 2018, OSDI.

[35]  Stanislav Lange,et al.  Survey of Performance Acceleration Techniques for Network Function Virtualization , 2019, Proceedings of the IEEE.

[36]  Insup Lee,et al.  vCAT: Dynamic Cache Management Using CAT Virtualization , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[37]  Stefanos Kaxiras,et al.  Splash-3: A properly synchronized benchmark suite for contemporary research , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[38]  Marco Chiesa,et al.  A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency , 2020, NSDI.

[39]  Gerald Q. Maguire,et al.  RSS++: load and state-aware receive side scaling , 2019, CoNEXT.

[40]  Ram Huggahalli,et al.  Direct cache access for high bandwidth network I/O , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[41]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[42]  Rebecca Steinert,et al.  Metron: NFV Service Chains at the True Speed of the Underlying Hardware , 2018, NSDI.

[43]  Dan Tsafrir,et al.  IOctopus: Outsmarting Nonuniform DMA , 2020, ASPLOS.

[44]  Fernando Pedone,et al.  The Case For In-Network Computing On Demand , 2019, EuroSys.

[45]  Nam Sung Kim,et al.  Data Direct I/O Characterization for Future I/O System Exploration , 2020, 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[46]  Laxmi N. Bhuyan,et al.  A new server I/O architecture for high speed networks , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[47]  Srihari Makineni,et al.  Characterization of Direct Cache Access on multi-core systems and 10GbE , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[48]  Harsha Basavaraj,et al.  A case for effective utilization of Direct Cache Access for big data workloads , 2017 .

[49]  Varghese George,et al.  Power management of the third generation intel core micro architecture formerly codenamed ivy bridge , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[50]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[51]  Mingyu Chen,et al.  DMA cache: Using on-chip storage to architecturally separate I/O data from CPU data for improving I/O performance , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[52]  Jan Reineke,et al.  CAMA: A Predictable Cache-Aware Memory Allocator , 2011, 2011 23rd Euromicro Conference on Real-Time Systems.

[53]  Woongki Baek,et al.  CoPart: Coordinated Partitioning of Last-Level Cache and Memory Bandwidth for Fairness-Aware Workload Consolidation on Commodity Servers , 2019, EuroSys.

[54]  Sparsh Mittal,et al.  A Survey of Techniques for Cache Partitioning in Multicore Processors , 2017, ACM Comput. Surv..

[55]  Geoffrey M. Voelker,et al.  CacheCloud: Towards Speed-of-light Datacenter Communication , 2018, HotCloud.

[56]  Geoffrey M. Voelker,et al.  Dark packets and the end of network scaling , 2018, ANCS.

[57]  Ashish Venkat,et al.  Packet Chasing: Spying on Network Packets over a Cache Side-Channel , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).