Fast critical sections via thread scheduling for FPGA-based multithreaded processors

As FPGA-based systems including soft processors become increasingly common, we are motivated to better understand the architectural trade-offs and improve the efficiency of these systems. Previous work has demonstrated that support for multithreading in soft processors can tolerate pipeline and I/O latencies as well as improve overall system throughput-however earlier work assumes an abundance of completely independent threads to execute. In this work we show that for real workloads, in particular packet processing applications, there is a large fraction of processor cycles wasted while awaiting the synchronization of shared data structures, limiting the benefits of a multithreaded design. We address this challenge by proposing a method of scheduling threads in hardware that allows the multithreaded pipeline to be more fully utilized without significant costs in area or frequency. We evaluate our technique relative to conventional multithreading using both simulation and a real implementation on a NetFPGA board, evaluating three deep-packet inspection applications that are threaded, synchronize, and share data structures, and show that overall packet throughput can be increased by 63%, 31%, and 41% for our three applications.

[1]  Kimberly C. Claffy "A day in the life of the internet": proposed community-wide experiment , 2006, CCRV.

[2]  Wayne Luk,et al.  Application-specific customisation of multi-threaded soft processors , 2006 .

[3]  Lizy Kurian John,et al.  NpBench: a benchmark suite for control plane and data plane applications for network processors , 2003, Proceedings 21st International Conference on Computer Design.

[4]  J. Gregory Steffan,et al.  Scaling Soft Processor Systems , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[5]  Wendong Hu,et al.  NetBench: a benchmarking suite for network processors , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[6]  Stamatis Vassiliadis,et al.  Analysis of a reconfigurable network processor , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[7]  Mazen A. R. Saghir,et al.  Microarchitectural Enhancements for Configurable Multi-Threaded Soft Processors , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[8]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[9]  Theo Ungerer,et al.  A survey of processors with explicit multithreading , 2003, CSUR.

[10]  Glen Gibb,et al.  NetFPGA--An Open Platform for Gigabit-Rate Network Switching and Routing , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[11]  Norman P. Jouppi,et al.  Organization and VLSI implementation of MIPS , 1984 .

[12]  Tilman Wolf,et al.  CommBench-a telecommunications benchmark for network processors , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[13]  Kevin D. Kissell,et al.  MIPS MT: A Multithreaded RISC Architecture for Embedded Real-Time Processing , 2008, HiPEAC.

[14]  Kurt Keutzer,et al.  An FPGA-based soft multiprocessor system for IPv4 packet forwarding , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[15]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[16]  Stephen Dean Brown,et al.  A Multithreaded Soft Processor for SoPC Area Reduction , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[17]  Jörg Henkel,et al.  Instruction Re-encoding Facilitating Dense Embedded Code , 2008, 2008 Design, Automation and Test in Europe.

[18]  Josep Torrellas,et al.  Prototyping architectural support for program rollback using FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[19]  J. Gregory Steffan,et al.  Improving Pipelined Soft Processors with Multithreading , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[20]  J. Gregory Steffan,et al.  Custom code generation for soft processors , 2007, CARN.

[21]  Frank Mueller,et al.  A Library Implementation of POSIX Threads under UNIX , 1993, USENIX Winter.