Synchronized-transfer-level design methodology applied to hardware matrix multiplication

In an effort to reduce the productivity gap separating hardware design and software programming practices, this paper presents the application of our synchronized-transfer-level hardware design methodology to the implementation of a hardware matrix multiplication accelerator. The methodology builds on a hardware description language for which the designer manages dynamic connections between sources and sinks that may not always be ready to send or receive data tokens. In addition to these connections, the designer can constrain the authorization of data transfers by the means of logical rules that make transfers dependant on each other. Combining both finite state machine and constraint programming paradigms, the featured description language enhances the ability to express and exploit low-level parallelism. A compiler automates the generation and the optimization of the synchronization logic, whose low-level complexity is thus hidden to the designer. Applied to the design of the pipelined matrix multiplication circuit, the proposed methodology leads to similar computing performances than the dedicated designs reported in the literature but within shorter design times (a single day), simpler source code and no need for advanced hardware design skills.

[1]  Shashank Dabral,et al.  Lessons and Experiences with High-Level Synthesis , 2009, IEEE Design & Test of Computers.

[2]  Kermin Fleming,et al.  Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA , 2007, 2007 5th IEEE/ACM International Conference on Formal Methods and Models for Codesign (MEMOCODE 2007).

[3]  John D. Davis,et al.  BLAS Comparison on FPGA, CPU and GPU , 2010, 2010 IEEE Computer Society Annual Symposium on VLSI.

[4]  Akila Gothandaraman,et al.  Comparing Hardware Accelerators in Scientific Applications: A Case Study , 2011, IEEE Transactions on Parallel and Distributed Systems.

[5]  Jiang Jiang,et al.  Matrix Multiplication Based on Scalable Macro-Pipelined FPGA Accelerator Architecture , 2009, 2009 International Conference on Reconfigurable Computing and FPGAs.

[6]  Yifeng Chen,et al.  Improving Performance of Matrix Multiplication and FFT on GPU , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[7]  Etienne Bergeron,et al.  An Intermediate Level HDL for System Level Design , 2004, FDL.

[8]  Viktor K. Prasanna,et al.  Analysis of high-performance floating-point arithmetic on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[9]  James C. Hoe,et al.  Operation-centric hardware description and synthesis , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Stephen A. Edwards,et al.  The synchronous languages 12 years later , 2003, Proc. IEEE.

[11]  Jean-Pierre David,et al.  Raising the abstraction level of HDL for control-dominant applications , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[12]  Abel G. Silva-Filho,et al.  An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[13]  Gérard Berry,et al.  The constructive semantics of pure esterel , 1996 .

[14]  K PrasannaViktor,et al.  Energy- and time-efficient matrix multiplication on FPGAs , 2005 .

[15]  Viktor K. Prasanna,et al.  Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems , 2007, IEEE Transactions on Parallel and Distributed Systems.

[16]  Daniel L. Rosenband Hardware synthesis from guarded atomic actions with performance specifications , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[17]  Viktor K. Prasanna,et al.  Energy- and time-efficient matrix multiplication on FPGAs , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Siddharth Joshi,et al.  FPGA Based High Performance Double-Precision Matrix Multiplication , 2009, 2009 22nd International Conference on VLSI Design.