GPU-Accelerated High-Level Synthesis for Bitwidth Optimization of FPGA Datapaths

Bitwidth optimization of FPGA datapaths can save hardware resources by choosing the fewest number of bits required for each datapath variable to achieve a desired quality of result. However, it is an NP-hard problem that requires unacceptably long runtimes when using sequential CPU-based heuristics. We show how to parallelize the key steps of bitwidth optimization on the GPU by performing a fast brute-force search over a carefully constrained search space. We develop a high-level synthesis methodology suitable for rapid prototyping of bitwidth-annotated RTL code generation using gcc's GIMPLE backend. For range analysis, we perform parallel evaluation of sub-intervals to provide tighter bounds compared to ordinary interval arithmetic. For bitwidth allocation, we enumerate the different bitwidth combinations in parallel by assigning each combination to a GPU thread. We demonstrate up to 10?1000x speedups for range analysis and 50?200x speedups for bitwidth allocation when comparing NVIDIA K20 GPU implementation to an Intel Core i5-4570 CPU while maintaining identical solution quality across various benchmarks. This allows us to generate tailor-made RTL with minimum bitwidths in hundreds of milliseconds instead of hundreds of minutes when starting from high-level C descriptions of dataflow computations.

[1]  SungWonyong,et al.  Combined word-length optimization and high-level synthesis of digital signal processing systems , 2006 .

[2]  Wayne Luk,et al.  Ieee Transactions on Computer-aided Design of Integrated Circuits and Systems Accuracy Guaranteed Bit-width Optimization Abstract— We Present Minibit, an Automated Static Approach for Optimizing Bit-widths of Fixed-point Feedforward Designs with Guaranteed Accuracy. Methods to Minimize Both the In- , 2022 .

[3]  Gerhard J. Woeginger,et al.  The complexity of multiple wordlength assignment , 2002, Appl. Math. Lett..

[4]  Alok N. Choudhary,et al.  Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[5]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[6]  Daniel Ménard,et al.  Many-core parallelization of fixed-point optimization of VLSI circuits through GPU devices , 2012, Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing.

[7]  Nachiket Kapre,et al.  MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[8]  L. Ingber Very fast simulated re-annealing , 1989 .

[9]  George A. Constantinides,et al.  Automated Precision Analysis: A Polynomial Algebraic Approach , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Nicola Nicolici,et al.  Finite Precision bit-width allocation using SAT-Modulo Theory , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[12]  Luis Rodrigues,et al.  MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations , 2013, FCCM 2013.

[13]  Andrew B. Kahng,et al.  Optimal partitioners and end-case placers for standard-cell layout , 1999, ISPD '99.

[14]  George A. Constantinides,et al.  A scalable approach for automated precision analysis , 2012, FPGA '12.

[15]  Teresa H. Y. Meng,et al.  Towards program optimization through automated analysis of numerical precision , 2010, CGO '10.

[16]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[17]  E. Zadok,et al.  Extending GCC with Modular GIMPLE Optimizations , .

[18]  Octavio Nieto-Taladriz,et al.  Improved Interval-Based Characterization of Fixed-Point LTI Systems With Feedback Loops , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Nicolas Hervé,et al.  High-Level Synthesis under Fixed-Point Accuracy Constraint , 2012, J. Electr. Comput. Eng..

[20]  Wayne Luk,et al.  Automatic Accuracy-Guaranteed Bit-Width Optimization for Fixed and Floating-Point Systems , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[21]  J. M. Pierre Langlois,et al.  Enhanced Precision Analysis for Accuracy-Aware Bit-Width Optimization Using Affine Arithmetic , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Jorge Stolfi,et al.  Affine Arithmetic: Concepts and Applications , 2004, Numerical Algorithms.

[23]  R. Baker Kearfott,et al.  Introduction to Interval Analysis , 2009 .

[24]  Guillaume Melquiond,et al.  Combining Coq and Gappa for Certifying Floating-Point Programs , 2009, Calculemus/MKM.