论文信息 - Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication

Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication

Customized pipeline designs that minimize the pipeline initiation interval (II) maximize the throughput of FPGA accelerators designed with high-level synthesis (HLS). What is the impact of minimizing II on energy efficiency? Using a matrix-multiply accelerator, we show that matrix multiplies with II>1 can sometimes reduce dynamic energy below II=1 due to interconnect savings, but II=1 always achieves energy close to the minimum. We also identify sources of inefficient mapping in the commercial tool flow.

[1] Jason Cong,et al. Power modeling and characteristics of field programmable gate arrays , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2] André DeHon,et al. Impact of Memory Architecture on FPGA Energy Consumption , 2015, FPGA.

[3] Jason Cong,et al. Automatic memory partitioning and scheduling for throughput and power optimization , 1999, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[4] Alejandro Duran,et al. The Intel® Many Integrated Core Architecture , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).

[5] Russell Tessier,et al. Power-Efficient RAM Mapping Algorithms for FPGA Embedded Memory Blocks , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6] Hee Kong Phoon,et al. A Highly Compatible Architecture Design for Optimum FPGA to Structured-ASIC Migration , 2006, 2006 IEEE International Conference on Semiconductor Electronics.

[7] Michael Bedford Taylor,et al. Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[8] Roy L. Russo,et al. On a Pin Versus Block Relationship For Partitions of Logic Graphs , 1971, IEEE Transactions on Computers.

[9] Viktor K. Prasanna,et al. Energy- and time-efficient matrix multiplication on FPGAs , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10] Viktor K. Prasanna,et al. Energy-Efficient Matrix Multiplication on FPGAs , 2002, FPL.

[11] André DeHon,et al. Fundamental Underpinnings of Reconfigurable Computing Architectures , 2015, Proceedings of the IEEE.

[12] Jason Cong,et al. Resource-Aware Throughput Optimization for High-Level Synthesis , 2015, FPGA.

[13] Charles E. Leiserson,et al. Area-Efficient Graph Layouts (for VLSI) , 1980, FOCS.

[14] Ali Akoglu,et al. An analytical energy model to accelerate FPGA logic architecture investigation , 2011, 2011 International Conference on Field-Programmable Technology.

[15] Jason Cong,et al. A Fully Pipelined and Dynamically Composable Architecture of CGRA , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[16] Steven J. E. Wilton,et al. A detailed power model for field-programmable gate arrays , 2005, TODE.

[17] Kevin Skadron,et al. Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.