Network Processing in Multi-core FPGAs with Integrated Cache-Network Interface

Per-core local (scratchpad) memories allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. A multicore FPGA platform with cache-integrated network interfaces (NIs) is presented, appropriate for scalable multicores, that combine the best of two worlds –the flexibility of caches (using implicit communication) and the efficiency of scratchpad memories (using explicit communication): on-chip SRAM is configurable shared among caching, scratchpad, and virtualized NI functions. The proposed system has been implemented in a four-core FPGA. Special hardware primitives (counter, queues) are used for the the communication and synchronization of the cores that are most suitable in network processing applications. The paper presents the performance evaluation of the proposed system in the domain of network processing. Two representatives benchmarks are used, one for header processing and one for payload processing. The system is evaluated in terms of performance and the communication overhead is measured. Furthermore, two approaches for the communication of the processors are evaluated and compared, common queue and distributed queues.

[1]  Simha Sethumadhavan,et al.  Distributed Microarchitectural Protocols in the TRIPS Prototype Processor , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[2]  Anant Agarwal,et al.  Anatomy of a message in the Alewife multiprocessor , 1993, ICS '93.

[3]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[4]  Michael J. Flynn,et al.  Producer-consumer communication in distributed shared memory multiprocessors , 1999, Proc. IEEE.

[5]  Dionisios N. Pnevmatikatos,et al.  FPGA implementation of a configurable cache/scratchpad memory with virtualized user-level RDMA capability , 2009, 2009 International Symposium on Systems, Architectures, Modeling, and Simulation.

[6]  Dimitrios S. Nikolopoulos,et al.  On-chip communication and synchronization mechanisms with cache-integrated network interfaces , 2010, Conf. Computing Frontiers.

[7]  Henri E. Bal,et al.  User-Level Network Interface Protocols , 1998, Computer.

[8]  Christoforos E. Kozyrakis,et al.  Comparing memory systems for chip multiprocessors , 2007, ISCA '07.

[9]  Tilman Wolf,et al.  CommBench-a telecommunications benchmark for network processors , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[10]  Anoop Gupta,et al.  Integration of message passing and shared memory in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[11]  Shubhendu S. Mukherjee,et al.  Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[12]  Gregory T. Byrd,et al.  Streamline: Cache-Based Message Passing in Scalable Multiprocessors , 1991, ICPP.