Compiling high throughput network processors

Gorilla is a methodology for generating FPGA-based solutions especially well suited for data parallel applications with fine grain irregularity. Irregularity simultaneously destroys performance and increases power consumption on many data parallel processors such as General Purpose Graphical Processor Units (GPGPUs). Gorilla achieves high performance and low power through the use of FPGA-tailored parallelization techniques and application-specific hardwired accelerators, processing engines, and communication mechanisms. Automatic compilation from a stylized C language and templates that define the hardware structure coupled with the intrinsic flexibility of FPGAs provide high performance, low power, and programmability. Gorilla's capabilities are demonstrated through the generation of a family of core-router network processors processing up to 100Gbps (200MPPS for 64B packets) supporting any mix of IPv4, IPv6, and Multi-Protocol Label Switching (MPLS) packets on a single FPGA with off-chip IP lookup tables. A 40Gbps version of that network processor was run with an embedded test rig on a Xilinx Virtex-6 FPGA, verifying for performance and correctness. Its measured power consumption is comparable to full custom, commercial network processors. In addition, it is demonstrated how Gorilla can be used to generate merged virtual routers, saving FPGA resources.

[1]  Gordon J. Brebner,et al.  Mapping a domain specific language to a platform FPGA , 2004, Proceedings. 41st Design Automation Conference, 2004..

[2]  Glen Gibb,et al.  NetFPGA--An Open Platform for Gigabit-Rate Network Switching and Routing , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[3]  Eddie Kohler,et al.  The Click modular router , 1999, SOSP.

[4]  Nick McKeown,et al.  Routing lookups in hardware at memory access speeds , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[5]  Derek Chiou,et al.  Enforcing architectural contracts in high-level synthesis , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Andreas Herkersdorf,et al.  A folded pipeline network processor architecture for 100 Gbit/s networks , 2010, 2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[7]  Katerina J. Argyraki,et al.  RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.

[8]  Amin Vahdat,et al.  Chimpp: A Click-based programming and simulation environment for reconfigurable networking hardware , 2010, 2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[9]  George Varghese,et al.  Tree bitmap: hardware/software IP lookups with incremental updates , 2004, CCRV.

[10]  Devavrat Shah,et al.  Maintaining Statistics Counters in Router Line Cards , 2002, IEEE Micro.

[11]  Daniel Gajski,et al.  Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths , 2005, 2005 International Conference on Computer Design.

[12]  Scott A. Mahlke,et al.  Optimus: efficient realization of streaming applications on FPGAs , 2008, CASES '08.

[13]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[14]  Gordon J. Brebner Reconfigurable Computing for High Performance Networking Applications , 2011, ARC.

[15]  Mark Horowitz,et al.  Rethinking Digital Design: Why Design Must Change , 2010, IEEE Micro.

[16]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[17]  G. Edwards,et al.  Programming the Convey HC-1 with ROCCC 2.0 * , 2010 .

[18]  Arvind,et al.  From WiFi to WiMAX: Techniques for High-Level IP Reuse across Different OFDM Protocols , 2007, 2007 5th IEEE/ACM International Conference on Formal Methods and Models for Codesign (MEMOCODE 2007).

[19]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[20]  Lixin Gao,et al.  Customizing virtual networks with partial FPGA reconfiguration , 2011, CCRV.

[21]  Viktor K. Prasanna,et al.  Memory-efficient and scalable virtual routers using FPGA , 2011, FPGA '11.

[22]  Viktor K. Prasanna,et al.  Parallel IP lookup using multiple SRAM-based pipelines , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[23]  Bin Liu,et al.  A TCAM-based distributed parallel IP lookup scheme and performance analysis , 2006, IEEE/ACM Transactions on Networking.