论文信息 - Parallelization Strategies for Network Interface Firmware

Parallelization Strategies for Network Interface Firmware

Typical data-intensive embedded applications have large amounts of instruction-level parallelism that is often exploited with wide-issue VLIW processors. In contrast, event-driven embedded applications are believed to have very little instruction-level parallelism, so these applications often utilize much simpler processor cores. Programmable network interface cards, for example, utilize thread-level parallelism across multiple processor cores to handle multiple events concurrently. However, the synchronization required to access a device’s shared external I/O interfaces lead to scalability limitations and diminishing returns. This paper compares the instruction-level parallelism versus thread-level parallelism in control-dominated network interface firmware and finds that though thread-level parallelism scales to higher levels of performance, there exists a significant amount of instruction-level parallelism that can be exploited by traditional wide-issue VLIW processors. For example, a seven-wide VLIW-based network interface architecture achieves approximately the same frame throughput as a two-way single-issue multiprocessor implementation. This motivates the use of wide-issue VLIW architectures typically used for media and signal processing workloads to extract parallelism left previously unutilized by existing thread-level approaches. This paper advocates that a combination of these two approaches should lead to even higher levels of performance than today’s control-oriented embedded systems architectures.

Scott Rixner | Paul Willmann | Michael Brogioli

[1] Scott Rixner,et al. An efficient programmable 10 gigabit Ethernet network interface card , 2005, 11th International Symposium on High-Performance Computer Architecture.

[2] Vijay S. Pai,et al. Spinach: a liberty-based simulator for programmable network interface architectures , 2004, LCTES '04.

[3] Dhabaleswar K. Panda,et al. EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[4] David I. August,et al. Microarchitectural exploration with Liberty , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[5] Mats Björkman,et al. Locking Effects in Multiprocessor Implementations of Protocols , 1993, SIGCOMM.

[6] Andy D. Pimentel,et al. TriMedia CPU64 architecture , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[7] Larry L. Peterson,et al. The x-Kernel: An Architecture for Implementing Network Protocols , 1991, IEEE Trans. Software Eng..

[8] Dhabaleswar K. Panda,et al. Can user-level protocols take advantage of multi-CPU NICs? , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[9] Scott Rixner,et al. Exploiting task-level concurrency in a programmable network interface , 2003, PPoPP '03.

[10] N. Seshan. High VelociTI processing [Texas Instruments VLIW DSP architecture] , 1998 .

[11] Erich M. Nahum,et al. Performance issues in parallelized network protocols , 1994, OSDI '94.

[12] Weidong Shi,et al. An Intel IXP1200-based Network Interface , 2003 .