Composable, parameterizable templates for high-level synthesis

High-level synthesis tools aim to make FPGA programming easier by raising the level of programming abstraction. Yet in order to get an efficient hardware design from HLS tools, the designer must know how to write HLS code that results in an efficient low level hardware architecture. Unfortunately, this requires substantial hardware knowledge, which limits wide adoption of HLS tools outside of hardware designers. In this work, we develop an approach based upon parameterizable templates that can be composed using common data access patterns. This creates a methodology for efficient hardware implementations. Our results demonstrate that a small number of optimized templates can be hierarchically composed to develop highly optimized hardware implementations for large applications.

[1]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[2]  David L. Dill,et al.  Trace theory for automatic hierarchical verification of speed-independent circuits , 1989, ACM distinguished dissertations.

[3]  Gunar Schirner,et al.  Function-Level Processor (FLP): Raising efficiency by operating at function granularity for market-oriented MPSoC , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[4]  Deepak Mathaikutty,et al.  Metamodeling Driven IP Reuse for System-on-chip Integration and Microprocessor Design , 2007 .

[5]  Robert J. Halstead,et al.  Compiling irregular applications for reconfigurable systems , 2014, Int. J. High Perform. Comput. Netw..

[6]  Jason Cong,et al.  Improving high level synthesis optimization opportunity through polyhedral transformations , 2013, FPGA '13.

[7]  Zhiru Zhang,et al.  Challenges and opportunities of ESL design automation , 2012, 2012 IEEE 11th International Conference on Solid-State and Integrated Circuit Technology.

[8]  Kui Yuan,et al.  An improved Canny edge detector and its realization on FPGA , 2008, 2008 7th World Congress on Intelligent Control and Automation.

[9]  Jason Cong,et al.  CMOST: A system-level FPGA compilation framework , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Spiridon Nikolaidis,et al.  Real-time canny edge detection parallel implementation for FPGAs , 2010, 2010 17th IEEE International Conference on Electronics, Circuits and Systems.

[11]  Chun Chen,et al.  Polyhedra scanning revisited , 2012, PLDI.

[12]  Jason Cong,et al.  FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[13]  Ryan Kastner,et al.  Simulate and Eliminate: A Top-to-Bottom Design Methodology for Automatic Generation of Application Specific Architectures , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Joseph R. Cavallaro,et al.  FPGA Implementation of Matrix Inversion Using QRD-RLS Algorithm , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[15]  Alberto L. Sangiovanni-Vincentelli,et al.  Platform-Based Design for Embedded Systems , 2005, Embedded Systems Handbook.

[16]  Antoni W. Mazurkiewicz,et al.  Introduction to Trace Theory , 1995, The Book of Traces.

[17]  Ryan Kastner,et al.  Energy efficient canonical huffman encoding , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[18]  Michael Ian Shamos,et al.  Closest-point problems , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[19]  Jason Cong,et al.  CHARM: a composable heterogeneous accelerator-rich microprocessor , 2012, ISLPED '12.

[20]  Luca P. Carloni,et al.  A design methodology for compositional high-level synthesis of communication-centric SoCs , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.