论文信息 - Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks

Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks

Memory access is the major bottleneck in realizing multi-hundred-gigabit networks with commodity hardware, hence it is essential to make good use of cache memory that is a faster, but smaller memor ...

[1] Zhao Zhang,et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[2] Xiaodong Wang,et al. SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[3] Jeffrey C. Mogul,et al. Nines are Not Enough: Meaningful Metrics for Clouds , 2019, HotOS.

[4] Andrew W. Moore,et al. Understanding PCIe performance for end host networking , 2018, SIGCOMM.

[5] Gerald Q. Maguire,et al. SNF: Synthesizing high performance NFV service chains , 2016, PeerJ Prepr..

[6] John K. Ousterhout. Always measure one level deeper , 2018, Commun. ACM.

[7] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[8] Ren Wang,et al. HALO: Accelerating Flow Classification for Scalable Packet Processing in NFV , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[9] Hyeontaek Lim,et al. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[10] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.

[11] Thomas E. Anderson,et al. Ingress Pipeline Queues Packet Buffer DMA PipelineDMA Egress Pipeline , 2015 .

[12] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[13] Orna Agmon Ben-Yehuda,et al. Ginseng: Market-Driven LLC Allocation , 2016, USENIX Annual Technical Conference.

[14] Herbert Bos,et al. : Practical Cache Attacks from the Network , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[15] Peng Zheng,et al. A Closer Look at NFV Execution Models , 2019, APNet.

[16] Nate Foster,et al. NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[17] Ankit Singla,et al. Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions , 2018, ArXiv.

[18] Mark Silberstein,et al. Lynx: A SmartNIC-driven Accelerator-centric Architecture for Network Servers , 2020, ASPLOS.

[19] Andrew W. Moore,et al. NetFPGA SUME: Toward 100 Gbps as Research Commodity , 2014, IEEE Micro.

[20] Xiaosong Ma,et al. KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[21] Ram Huggahalli,et al. Impact of Cache Coherence Protocols on the Processing of Network Traffic , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[22] Xiang Gao,et al. Using Direct Cache Access Combined with Integrated NIC Architecture to Accelerate Network Processing , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[23] Laurent Mathy,et al. Fast userspace packet processing , 2015, 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[24] Yingwei Luo,et al. DCAPS: dynamic cache allocation with partial sharing , 2018, EuroSys.

[25] Katerina J. Argyraki,et al. ResQ: Enabling SLOs in Network Function Virtualization , 2018, NSDI.

[26] Gerald Q. Maguire,et al. Make the Most out of Last Level Cache in Intel Processors , 2019, EuroSys.

[27] Boris Grot,et al. Scale-out ccNUMA: exploiting skew with strongly consistent caching , 2018, EuroSys.

[28] Michael M. Swift,et al. Loom: Flexible and Efficient NIC Packet Scheduling , 2019, NSDI.

[29] Brad Calder,et al. Reducing cache misses using hardware and software page placement , 1999, ICS '99.

[30] Mark Rowland,et al. The Intel® Xeon® processor E5 family architecture, power efficiency, and performance , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[31] Yang Li,et al. dCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service , 2018, EuroSys.

[32] Nick McKeown,et al. The Case for a Network Fast Path to the CPU , 2019, HotNets.

[33] Veljko M. Milutinovic,et al. The cache injection/cofetch architecture: initial performance evaluation , 1997, Proceedings Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[34] Robert Ricci,et al. Taming Performance Variability , 2018, OSDI.

[35] Stanislav Lange,et al. Survey of Performance Acceleration Techniques for Network Function Virtualization , 2019, Proceedings of the IEEE.

[36] Insup Lee,et al. vCAT: Dynamic Cache Management Using CAT Virtualization , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[37] Stefanos Kaxiras,et al. Splash-3: A properly synchronized benchmark suite for contemporary research , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[38] Marco Chiesa,et al. A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency , 2020, NSDI.

[39] Gerald Q. Maguire,et al. RSS++: load and state-aware receive side scaling , 2019, CoNEXT.

[40] Ram Huggahalli,et al. Direct cache access for high bandwidth network I/O , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[41] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[42] Rebecca Steinert,et al. Metron: NFV Service Chains at the True Speed of the Underlying Hardware , 2018, NSDI.

[43] Dan Tsafrir,et al. IOctopus: Outsmarting Nonuniform DMA , 2020, ASPLOS.

[44] Fernando Pedone,et al. The Case For In-Network Computing On Demand , 2019, EuroSys.

[45] Nam Sung Kim,et al. Data Direct I/O Characterization for Future I/O System Exploration , 2020, 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[46] Laxmi N. Bhuyan,et al. A new server I/O architecture for high speed networks , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[47] Srihari Makineni,et al. Characterization of Direct Cache Access on multi-core systems and 10GbE , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[48] Harsha Basavaraj,et al. A case for effective utilization of Direct Cache Access for big data workloads , 2017 .

[49] Varghese George,et al. Power management of the third generation intel core micro architecture formerly codenamed ivy bridge , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[50] David G. Andersen,et al. Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[51] Mingyu Chen,et al. DMA cache: Using on-chip storage to architecturally separate I/O data from CPU data for improving I/O performance , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[52] Jan Reineke,et al. CAMA: A Predictable Cache-Aware Memory Allocator , 2011, 2011 23rd Euromicro Conference on Real-Time Systems.

[53] Woongki Baek,et al. CoPart: Coordinated Partitioning of Last-Level Cache and Memory Bandwidth for Fairness-Aware Workload Consolidation on Commodity Servers , 2019, EuroSys.

[54] Sparsh Mittal,et al. A Survey of Techniques for Cache Partitioning in Multicore Processors , 2017, ACM Comput. Surv..

[55] Geoffrey M. Voelker,et al. CacheCloud: Towards Speed-of-light Datacenter Communication , 2018, HotCloud.

[56] Geoffrey M. Voelker,et al. Dark packets and the end of network scaling , 2018, ANCS.

[57] Ashish Venkat,et al. Packet Chasing: Spying on Network Packets over a Cache Side-Channel , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).