Packet Processing Architecture Using Last-Level-Cache Slices and Interleaved 3D-Stacked DRAM
暂无分享,去创建一个
Eiji Oki | Akio Kawabata | Fujun He | Tomohiro Korikawa | E. Oki | Fujun He | Akio Kawabata | T. Korikawa
[1] Wei Zhang,et al. Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.
[2] Thomas F. Wenisch,et al. System-level implications of disaggregated memory , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[3] Li Fan,et al. Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).
[4] Joonho Kong,et al. A DVFS-aware cache bypassing technique for multiple clock domain mobile SoCs , 2017, IEICE Electron. Express.
[5] Yan Solihin,et al. Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.
[6] Mikko H. Lipasti,et al. Data compression for thermal mitigation in the Hybrid Memory Cube , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).
[7] Ki-Seok Chung,et al. CasHMC: A Cycle-Accurate Simulator for Hybrid Memory Cube , 2017, IEEE Computer Architecture Letters.
[8] Nick McKeown,et al. Routing lookups in hardware at memory access speeds , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.
[9] Dong Li,et al. Integrated Thermal Analysis for Processing In Die-Stacking Memory , 2016, MEMSYS.
[10] Saurabh Gupta,et al. Adaptive Cache Bypassing for Inclusive Last Level Caches , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[11] Sudhakar Yalamanchili,et al. Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[12] Eiji Oki,et al. Carrier-Scale Packet Processing Architecture Using Interleaved 3D-Stacked DRAM and Its Analysis , 2019, IEEE Access.
[13] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[14] Yuki Kobayashi,et al. Accelerating NFV application using CPU-FPGA tightly coupled architecture , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).
[15] Jangwoo Kim,et al. Cryogenic Computer Architecture Modeling with Memory-Side Case Studies , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[16] Yiran Chen,et al. Statistical Cache Bypassing for Non-Volatile Memory , 2016, IEEE Transactions on Computers.
[17] Elkin Garcia,et al. A Reconfigurable Computing System Based on a Cache-Coherent Fabric , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.
[18] Li Xiao,et al. DiCAS: An Efficient Distributed Caching Mechanism for P2P Systems , 2006, IEEE Transactions on Parallel and Distributed Systems.
[19] Fulvio Risso,et al. Introducing SmartNICs in Server-Based Data Plane Processing: The DDoS Mitigation Use Case , 2019, IEEE Access.
[20] Wei Chen,et al. A 22nm 2.5MB slice on-die L3 cache for the next generation Xeon® Processor , 2013, 2013 Symposium on VLSI Circuits.
[21] Madhu Mutyam,et al. SkipCache: application aware cache management for chip multi-processors , 2015, IET Comput. Digit. Tech..
[22] Sparsh Mittal,et al. A Survey of Cache Bypassing Techniques , 2016 .
[23] Tzi-cker Chiueh,et al. High-performance IP routing table lookup using CPU caching , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).
[24] Katerina J. Argyraki,et al. RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.
[25] Edith Cohen,et al. Proactive caching of DNS records: addressing a performance bottleneck , 2001, Proceedings 2001 Symposium on Applications and the Internet.
[26] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[27] Keith Kim,et al. HBM (High Bandwidth Memory) DRAM Technology and Architecture , 2017, 2017 IEEE International Memory Workshop (IMW).
[28] Bahar Asgari,et al. Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube , 2017, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[29] Wolfgang Kellerer,et al. Towards optimal adaptation of NFV packet processing to modern CPU memory architectures , 2017, CAN@CoNEXT.
[30] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[31] Radu Marculescu,et al. On-chip traffic modeling and synthesis for MPEG-2 video applications , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[32] Hirochika Asai,et al. Poptrie: A Compressed Trie with Population Count for Fast and Scalable Software IP Routing Table Lookup , 2015, SIGCOMM.
[33] Gabriel H. Loh,et al. Thermal Feasibility of Die-Stacked Processing in Memory , 2014 .
[34] Akhilesh Kumar,et al. MoDe-X: Microarchitecture of a Layout-Aware Modular Decoupled Crossbar for On-Chip Interconnects , 2014, IEEE Transactions on Computers.
[35] Donald A. Calahan,et al. Models of Access Delays in Multiprocessor Memories , 1992, IEEE Trans. Parallel Distributed Syst..
[36] Sujit Dey,et al. Evaluation of the traffic-performance characteristics of system-on-chip communication architectures , 2001, VLSI Design 2001. Fourteenth International Conference on VLSI Design.
[37] Sangwoo Han,et al. PIM architecture exploration for HMC , 2016, 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).
[38] Boris Grot,et al. Farewell My Shared LLC! A Case for Private Die-Stacked DRAM Caches for Servers , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[39] Ki-Seok Chung,et al. HMC-MAC: Processing-in Memory Architecture for Multiply-Accumulate Operations with Hybrid Memory Cube , 2018, IEEE Computer Architecture Letters.
[40] Daniel A. Jiménez,et al. Adaptive GPU cache bypassing , 2015, GPGPU@PPoPP.
[41] Thomas F. Wenisch,et al. Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.
[42] Sunggu Lee,et al. Hybrid Main Memory for High Bandwidth Multi-Core System , 2015, IEEE Transactions on Multi-Scale Computing Systems.
[43] Min Huang,et al. An Energy Efficient 32-nm 20-MB Shared On-Die L3 Cache for Intel® Xeon® Processor E5 Family , 2013, IEEE Journal of Solid-State Circuits.
[44] Houman Homayoun,et al. Heterogeneous HMC+DDRx Memory Management , 2017 .
[45] Robert Tappan Morris,et al. DNS performance and the effectiveness of caching , 2001, IMW '01.
[46] George Kurian,et al. Locality-aware data replication in the Last-Level Cache , 2014, HPCA.
[47] Scott Klasky,et al. SELF: A High Performance and Bandwidth Efficient Approach to Exploiting Die-Stacked DRAM as Part of Memory , 2017, 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).
[48] David Blaauw,et al. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[49] Gerald Q. Maguire,et al. Make the Most out of Last Level Cache in Intel Processors , 2019, EuroSys.
[50] Vivek S. Pai,et al. Towards understanding modern web traffic , 2011, SIGMETRICS '11.
[51] Gorka Irazoqui Apecechea,et al. Systematic Reverse Engineering of Cache Slice Selection in Intel Processors , 2015, 2015 Euromicro Conference on Digital System Design.
[52] Geoffrey Elliott,et al. Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules , 2017, FPGA.
[53] Eiji Oki,et al. Packet Processing Architecture With Off-Chip LLC Using Interleaved 3D-Stacked DRAM , 2019, 2019 IEEE 20th International Conference on High Performance Switching and Routing (HPSR).
[54] Sangjin Han,et al. PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.
[55] Li-Shiuan Peh,et al. A Statistical Traffic Model for On-Chip Interconnection Networks , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.
[56] Efraim Rotem,et al. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.
[57] Eriko Nurvitadhi,et al. A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study , 2018, FPGA.
[58] Eiji Oki,et al. Carrier-Scale Packet Processing System Using Interleaved 3D-Stacked DRAM , 2018, 2018 IEEE International Conference on Communications (ICC).