Design and Implementation of High-Performance Memory Systems for Future Packet Buffers

In this paper we address the design of a future high-speedrouter that supports line rates as high as OC-3072 (160 Gb/s),around one hundred ports and several service classes. Buildingsuch a high-speed router would raise many technological problems,one of them being the packet buffer design, mainly becausein router design it is important to provide worst-case bandwidthguarantees and not just average-case optimizations.A previous packet buffer design provides worst-case bandwidthguarantees by using a hybrid SRAM/DRAM approach. Next-generationrouters need to support hundreds of interfaces (i.e.,ports and service classes). Unfortunately, high bandwidth for hundredsof interfaces requires the previous design to use large SRAMswhich become a bandwidth bottleneck. The key observation wemake is that the SRAM size is proportional to the DRAM accesstime but we can reduce the effective DRAM access time by overlappingmultiple accesses to different banks, allowing us to reduce theSRAM size. The key challenge is that to keep the worst-case bandwidthguarantees we need to guarantee that there are no bank conflictswhile the accesses are in flight. We guarantee bank conflictsby reordering the DRAM requests using a modern issue-queue-likemechanism. Because our design may lead to fragmentationof memory across packet buffer queues, we propose to share theDRAM space among multiple queues by renaming the queue slots.To the best of our knowledge, the design proposed in this paper isthe fastest buffer design using commodity DRAM to be publishedto date.

[1]  Y. Tamir,et al.  High-performance multi-queue buffers for VLSI communications switches , 1988, ISCA '88.

[2]  B. Ramakrishna Rau,et al.  The Cydram 5 Stride-Insensitive Memory System , 1989, ICPP.

[3]  Eduard Ayguadé,et al.  Increasing the number of strides for conflict-free vector access , 1992, ISCA '92.

[4]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, ISCA.

[5]  Richard Crisp,et al.  Direct RAMbus technology: the new main memory standard , 1997, IEEE Micro.

[6]  George N. Glykopoulos Design and Implementation of a 1.2 Gbit/s ATM Cell Buffer using a Synchronous DRAM chip , 1998 .

[7]  Mateo Valero,et al.  Command vector memory systems: high performance at low cost , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[8]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[9]  Russ White,et al.  Inside Cisco IOS Software Architecture , 2000 .

[10]  Bradley C. Kuszmaul,et al.  Circuits for wide-window superscalar processors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[12]  Manolis Katevenis,et al.  Efficient per-flow queueing in DRAM at OC-192 line rate using out-of-order execution techniques , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).

[13]  Patrick Crowley,et al.  Network Processor Design: Issues and Practices , 2002 .

[14]  Mateo Valero,et al.  A conflict-free memory banking architecture for fast VOQ packet buffers , 2003, GLOBECOM '03. IEEE Global Telecommunications Conference (IEEE Cat. No.03CH37489).

[15]  T. N. Vijaykumar,et al.  Efficient use of memory bandwidth to improve network processor throughput , 2003, ISCA '03.

[16]  Cyriel Minkenberg,et al.  Current issues in packet switch design , 2003, CCRV.