Parallelization Strategies for Network Interface Firmware

Typical data-intensive embedded applications have large amounts of instruction-level parallelism that is often exploited with wide-issue VLIW processors. In contrast, event-driven embedded applications are believed to have very little instruction-level parallelism, so these applications often utilize much simpler processor cores. Programmable network interface cards, for example, utilize thread-level parallelism across multiple processor cores to handle multiple events concurrently. However, the synchronization required to access a device’s shared external I/O interfaces lead to scalability limitations and diminishing returns. This paper compares the instruction-level parallelism versus thread-level parallelism in control-dominated network interface firmware and finds that though thread-level parallelism scales to higher levels of performance, there exists a significant amount of instruction-level parallelism that can be exploited by traditional wide-issue VLIW processors. For example, a seven-wide VLIW-based network interface architecture achieves approximately the same frame throughput as a two-way single-issue multiprocessor implementation. This motivates the use of wide-issue VLIW architectures typically used for media and signal processing workloads to extract parallelism left previously unutilized by existing thread-level approaches. This paper advocates that a combination of these two approaches should lead to even higher levels of performance than today’s control-oriented embedded systems architectures.

[1]  Scott Rixner,et al.  An efficient programmable 10 gigabit Ethernet network interface card , 2005, 11th International Symposium on High-Performance Computer Architecture.

[2]  Vijay S. Pai,et al.  Spinach: a liberty-based simulator for programmable network interface architectures , 2004, LCTES '04.

[3]  Dhabaleswar K. Panda,et al.  EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[4]  David I. August,et al.  Microarchitectural exploration with Liberty , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[5]  Mats Björkman,et al.  Locking Effects in Multiprocessor Implementations of Protocols , 1993, SIGCOMM.

[6]  Andy D. Pimentel,et al.  TriMedia CPU64 architecture , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[7]  Larry L. Peterson,et al.  The x-Kernel: An Architecture for Implementing Network Protocols , 1991, IEEE Trans. Software Eng..

[8]  Dhabaleswar K. Panda,et al.  Can user-level protocols take advantage of multi-CPU NICs? , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[9]  Scott Rixner,et al.  Exploiting task-level concurrency in a programmable network interface , 2003, PPoPP '03.

[10]  N. Seshan High VelociTI processing [Texas Instruments VLIW DSP architecture] , 1998 .

[11]  Erich M. Nahum,et al.  Performance issues in parallelized network protocols , 1994, OSDI '94.

[12]  Weidong Shi,et al.  An Intel IXP1200-based Network Interface , 2003 .