Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication

Customized pipeline designs that minimize the pipeline initiation interval (II) maximize the throughput of FPGA accelerators designed with high-level synthesis (HLS). What is the impact of minimizing II on energy efficiency? Using a matrix-multiply accelerator, we show that matrix multiplies with II>1 can sometimes reduce dynamic energy below II=1 due to interconnect savings, but II=1 always achieves energy close to the minimum. We also identify sources of inefficient mapping in the commercial tool flow.

[1]  Jason Cong,et al.  Power modeling and characteristics of field programmable gate arrays , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  André DeHon,et al.  Impact of Memory Architecture on FPGA Energy Consumption , 2015, FPGA.

[3]  Jason Cong,et al.  Automatic memory partitioning and scheduling for throughput and power optimization , 1999, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[4]  Alejandro Duran,et al.  The Intel® Many Integrated Core Architecture , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).

[5]  Russell Tessier,et al.  Power-Efficient RAM Mapping Algorithms for FPGA Embedded Memory Blocks , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Hee Kong Phoon,et al.  A Highly Compatible Architecture Design for Optimum FPGA to Structured-ASIC Migration , 2006, 2006 IEEE International Conference on Semiconductor Electronics.

[7]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[8]  Roy L. Russo,et al.  On a Pin Versus Block Relationship For Partitions of Logic Graphs , 1971, IEEE Transactions on Computers.

[9]  Viktor K. Prasanna,et al.  Energy- and time-efficient matrix multiplication on FPGAs , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Viktor K. Prasanna,et al.  Energy-Efficient Matrix Multiplication on FPGAs , 2002, FPL.

[11]  André DeHon,et al.  Fundamental Underpinnings of Reconfigurable Computing Architectures , 2015, Proceedings of the IEEE.

[12]  Jason Cong,et al.  Resource-Aware Throughput Optimization for High-Level Synthesis , 2015, FPGA.

[13]  Charles E. Leiserson,et al.  Area-Efficient Graph Layouts (for VLSI) , 1980, FOCS.

[14]  Ali Akoglu,et al.  An analytical energy model to accelerate FPGA logic architecture investigation , 2011, 2011 International Conference on Field-Programmable Technology.

[15]  Jason Cong,et al.  A Fully Pipelined and Dynamically Composable Architecture of CGRA , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[16]  Steven J. E. Wilton,et al.  A detailed power model for field-programmable gate arrays , 2005, TODE.

[17]  Kevin Skadron,et al.  Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.