Scalable and Efficient Linear Algebra Kernel Mapping for Low Energy Consumption on the Layers CGRA

A scalable mapping is proposed for 3 important kernels from the Numerical Linear Algebra domain, to exploit architectural features to reach asymptotically optimal efficiency and a low energy consumption. Performance and power evaluations were done with input data set matrix sizes ranging from 64\(\times \)64 to 16384\(\times \)16384. 12 architectural variants with up to 10\(\times \)10 processing elements were used to explore scalability of the mapping and the architecture, achieving \(<10\,\%\) energy increase for architectures up to 8\(\times \)8 PEs coupled with performance speed-ups of more than an order of magnitude. This enables a clean area-performance trade-off on the Layers architecture while keeping energy constant over the variants.

[1]  Anupam Chattopadhyay,et al.  Force-directed scheduling for Data Flow Graph mapping on Coarse-Grained Reconfigurable Architectures , 2014, 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14).

[2]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[3]  Rafael C Núñez,et al.  LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators , 2009 .

[4]  Z. E. Rakossy,et al.  Design and analysis of layered coarse-grained reconfigurable architecture , 2012, 2012 International Conference on Reconfigurable Computing and FPGAs.

[5]  Anupam Chattopadhyay,et al.  Exploiting architecture description language for diverse IP synthesis in heterogeneous MPSoC , 2013, 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig).

[6]  Robert A. van de Geijn,et al.  Level-3 BLAS on the TI C6678 Multi-core DSP , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[7]  Robert A. van de Geijn,et al.  Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures , 2012, IEEE Transactions on Computers.

[8]  S. K. Nandy,et al.  Scalable and energy-efficient reconfigurable accelerator for column-wise givens rotation , 2014, 2014 22nd International Conference on Very Large Scale Integration (VLSI-SoC).

[9]  Anupam Chattopadhyay,et al.  Ingredients of Adaptability: A Survey of Reconfigurable Processors , 2013, VLSI Design.

[10]  André DeHon,et al.  The Density Advantage of Configurable Computing , 2000, Computer.

[11]  Yong Dou,et al.  FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic , 2012, The Journal of Supercomputing.