Accelerator compiler for the VENICE vector processor

This paper describes the compiler design for VENICE, a new soft vector processor (SVP). The compiler is a new back-end target for Microsoft Accelerator, a high-level data parallel library for C++ and C#. This allows us to automatically compile high-level programs into VENICE assembly code, thus avoiding the process of writing assembly code used by previous SVPs. Experimental results show the compiler can generate scalable parallel code with execution times that are comparable to hand-written VENICE assembly code. On data-parallel applications, VENICE at 100MHz on an Altera DE3 platform runs at speeds comparable to one core of a 3.5GHz Intel Xeon W3690 processor, beating it in performance on four of six benchmarks by up to 3.2x.

[1]  Guy Lemieux,et al.  Vector processing as a soft-core CPU accelerator , 2008, FPGA '08.

[2]  Satnam Singh,et al.  From SMPs to FPGAs: Multi-Target Data-Parallel Programming , 2010 .

[3]  Joshua S. Auerbach,et al.  Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.

[4]  Guy Lemieux,et al.  VENICE: A Compact Vector Processor for FPGA Applications , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[5]  Scott A. Mahlke,et al.  Sponge: portable stream programming on graphics engines , 2011, ASPLOS XVI.

[6]  Michael D. McCool,et al.  Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[7]  Heonshik Shin,et al.  Scratchpad memory management for portable systems with a memory management unit , 2006, EMSOFT '06.

[8]  Andrew W. Appel,et al.  Generalizations of the sethi‐ullman algorithm for register allocation , 1987, Softw. Pract. Exp..

[9]  Christoforos E. Kozyrakis,et al.  Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler , 2000, Intelligent Memory Systems.

[10]  Christoforos E. Kozyrakis,et al.  Scalable Vector Processors for Embedded Systems , 2003, IEEE Micro.

[11]  David A. Patterson,et al.  Scalable Vector Media-processors for Embedded Systems , 2002 .

[12]  Jonathan Rose,et al.  Data parallel FPGA workloads: Software versus hardware , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[13]  David Tarditi,et al.  Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.

[14]  Vijay Saraswat,et al.  GPU programming in a high level language: compiling X10 to CUDA , 2011, X10 '11.

[15]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[16]  Jonathan Rose,et al.  VESPA: portable, scalable, and flexible FPGA-based vector processors , 2008, CASES '08.

[17]  Jeffrey D. Ullman,et al.  The Generation of Optimal Code for Arithmetic Expressions , 1970, JACM.

[18]  Guy Lemieux,et al.  Vector Processing as a Soft Processor Accelerator , 2009, TRETS.

[19]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[20]  Guy Lemieux,et al.  VEGAS: soft vector processor with scratchpad memory , 2011, FPGA '11.

[21]  Satnam Singh,et al.  FPGA Circuit Synthesis of Accelerator Data-Parallel Programs , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[22]  Tarek S. Abdelrahman,et al.  hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.

[23]  Lin Gao,et al.  Memory coloring: a compiler approach for scratchpad memory management , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[24]  Gerhard Wellein,et al.  The world's fastest CPU and SMP node: Some performance results from the NEC SX-9 , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[25]  David F. Bacon,et al.  Compiling a high-level language for GPUs: (via language support for architectures and compilers) , 2012, PLDI.