Carrier-Scale Packet Processing Architecture Using Interleaved 3D-Stacked DRAM and Its Analysis

New network services such as the Internet of Things and edge computing are accelerating the increase in traffic volume, the number of connected devices, and the diversity of communication. Next generation carrier network infrastructure should be much more scalable and adaptive to rapid increase and divergence in network demand with much lower cost. A more virtualization-aware, flexible and inexpensive system based on general-purpose hardware is necessary to transform the traditional carrier network into a more adaptive, next generation network. In this paper, we propose an architecture for carrier-scale packet processing that is based on interleaved 3 dimensional (3D)-stacked dynamic random access memory (DRAM) devices. The proposed architecture enhances memory access concurrency by leveraging vault-level parallelism and bank interleaving of 3D-stacked DRAM. The proposed architecture uses the hash-function-based distribution of memory requests to each set of vault and bank; a significant portion of the full carrier-scale tables. We introduce an analytical model of the proposed architecture for two traffic patterns; one with random memory request arrivals and one with bursty arrivals. By using the model, we calculate the performance of a typical Internet protocol routing application as a benchmark of carrier-scale packet processing wherein main memory accesses are inevitable. The evaluation shows that the proposed architecture achieves around 80 Gbps for carrier-scale packet processing involving both random and bursty request arrivals.

[1]  Xinyu Yang,et al.  A Survey on Internet of Things: Architecture, Enabling Technologies, Security and Privacy, and Applications , 2017, IEEE Internet of Things Journal.

[2]  Katerina J. Argyraki,et al.  RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.

[3]  Geoffrey Elliott,et al.  Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules , 2017, FPGA.

[4]  Wenguang Chen,et al.  Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Hirochika Asai,et al.  Poptrie: A Compressed Trie with Population Count for Fast and Scalable Software IP Routing Table Lookup , 2015, SIGCOMM.

[6]  J. Little A Proof for the Queuing Formula: L = λW , 1961 .

[7]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[8]  Dong Li,et al.  Integrated Thermal Analysis for Processing In Die-Stacking Memory , 2016, MEMSYS.

[9]  Sudhakar Yalamanchili,et al.  Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[10]  Mehrdad Nourani,et al.  A TCAM-Based Parallel Architecture for High-Speed Packet Forwarding , 2007, IEEE Transactions on Computers.

[11]  Yi Wu,et al.  A scalable pipeline architecture for IPv4/IPv6 route lookup , 2012, 2012 18th IEEE International Conference on Networks (ICON).

[12]  Eriko Nurvitadhi,et al.  A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study , 2018, FPGA.

[13]  Mikko H. Lipasti,et al.  Data compression for thermal mitigation in the Hybrid Memory Cube , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[14]  Eiji Oki,et al.  Carrier-Scale Packet Processing System Using Interleaved 3D-Stacked DRAM , 2018, 2018 IEEE International Conference on Communications (ICC).

[15]  Ki-Seok Chung,et al.  CasHMC: A Cycle-Accurate Simulator for Hybrid Memory Cube , 2017, IEEE Computer Architecture Letters.

[16]  Nick McKeown,et al.  Routing lookups in hardware at memory access speeds , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[17]  Mahadev Satyanarayanan,et al.  The Emergence of Edge Computing , 2017, Computer.

[18]  Pi-Chung Wang,et al.  TCAM-Based IP Address Lookup Using Longest Suffix Split , 2018, IEEE/ACM Transactions on Networking.

[19]  Yuki Kobayashi,et al.  Accelerating NFV application using CPU-FPGA tightly coupled architecture , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).

[20]  David E. Taylor Survey and taxonomy of packet classification techniques , 2005, CSUR.

[21]  R. Govindarajan,et al.  Packet Reordering in Network Processors , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[22]  Xing Zhang,et al.  A Survey on Mobile Edge Networks: Convergence of Computing, Caching and Communications , 2017, IEEE Access.

[23]  Nick McKeown,et al.  Scaling internet routers using optics , 2003, SIGCOMM '03.

[24]  Dario Sabella,et al.  Mobile-Edge Computing Architecture: The role of MEC in the Internet of Things , 2016, IEEE Consumer Electronics Magazine.

[25]  Gabriel H. Loh,et al.  Thermal Feasibility of Die-Stacked Processing in Memory , 2014 .

[26]  Bahar Asgari,et al.  Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube , 2017, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[27]  Keith Kim,et al.  HBM (High Bandwidth Memory) DRAM Technology and Architecture , 2017, 2017 IEEE International Memory Workshop (IMW).

[28]  Toshiaki Kirihata,et al.  Three-Dimensional Dynamic Random Access Memories Using Through-Silicon-Vias , 2016, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[29]  Elkin Garcia,et al.  A Reconfigurable Computing System Based on a Cache-Coherent Fabric , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[30]  Masuda Akeo,et al.  Toward carrier-scale general-purpose node , 2017 .

[31]  Victor O. K. Li,et al.  An Overview of Packet Reordering in Transmission Control Protocol (TCP): Problems, Solutions, and Challenges , 2007, IEEE Transactions on Parallel and Distributed Systems.

[32]  William J. Dally,et al.  Architecting an Energy-Efficient DRAM System for GPUs , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).