VESPA: portable, scalable, and flexible FPGA-based vector processors

While soft processors are increasingly common in FPGA-based embedded systems, it remains a challenge to scale their performance. We propose extending soft processor instruction sets to include support for vector processing. The resulting system of vectorized software and soft vector processor hardware is (i) portable to any FPGA architecture and vector processor configuration, (ii) scalable to larger yet higher-performance designs, and (iii) flexible, allowing the underlying vector processor to be customized to match the needs of each application. Using our robust and verified parameterized vector processor design and industry-standard EEMBC benchmarks, we evaluate the performance and area trade-offs for different soft vector processor configurations using an FPGA development platform with DDR SDRAM. We find that on average we can scale performance from 1.8x up to 6.3x for a vector processor design that saturates the capacity of our platform's Stratix 1S80 FPGA. We also automatically generate application-specific vector processors with reduced datapath width and instruction set support which combined reduce the area by up to 70% (61% on average) without affecting performance.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Jonathan Rose,et al.  Enhancing the area-efficiency of FPGAs with hard circuits using shadow clusters , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[3]  Guy Lemieux,et al.  Vector processing as a soft-core CPU accelerator , 2008, FPGA '08.

[4]  Jonathan Rose,et al.  The Transmogrifier-4: an FPGA-based hardware development system with multi-gigabyte memory capacity and high host and memory bandwidth , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[5]  J. Gregory Steffan,et al.  The microarchitecture of FPGA-based soft processors , 2005, CASES '05.

[6]  R. Shackleton A Quantitative Approach , 2005 .

[7]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[8]  Christoforos E. Kozyrakis,et al.  Overcoming the limitations of conventional vector processors , 2003, ISCA '03.

[9]  Christoforos E. Kozyrakis,et al.  Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, MICRO.

[10]  Shuai Wang,et al.  Vector Processing Support for FPGA-Oriented High Performance Applications , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[11]  Wonyong Sung,et al.  An FPGA based SIMD processor with a vector memory unit , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[12]  David A. Patterson,et al.  Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[13]  Jonathan Rose,et al.  Application-specific customization of soft processor microarchitecture , 2006, FPGA '06.

[14]  Alex K. Jones,et al.  An FPGA-based VLIW processor with custom hardware execution , 2005, FPGA '05.

[15]  David Patterson,et al.  Embracing and Extending 20th-Century Instruction Set Architectures , 2007, Computer.

[16]  Christoforos E. Kozyrakis,et al.  Scalable Vector Processors for Embedded Systems , 2003, IEEE Micro.

[17]  J. Gregory Steffan,et al.  Scaling Soft Processor Systems , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.