HPVM2FPGA: Enabling True Hardware-Agnostic FPGA Programming

Current FPGA programming tools require extensive hardware-specific manual code tuning to achieve performance, which is intractable for most software application teams. We present HPVM2FPGA, a novel end-to-end compiler and auto-tuning system that can automatically tune hardware-agnostic programs for FPGAs. HPVM2FPGA uses a hardware-agnostic abstraction of parallelism as an intermediate representation (IR) to represent hardware-agnostic programs. HPVM2FPGA's powerful optimization framework uses sophisticated compiler optimizations and design space exploration (DSE) to automatically tune a hardware-agnostic program for a given FPGA. HPVM2FPGA is able to support software programmers by shifting the burden of performing hardware-specific optimizations to the compiler and DSE. We show that HPVM2FPGA can achieve up to 33×speedup compared to unoptimized baselines and can match the performance of hand-tuned HLS code for three of four benchmarks. We have designed HPVM2FPGA to be a modular and extensible framework, and we expect it to match hand-tuned code for most programs as the system matures with more optimizations. Overall, we believe that it constitutes a solid step closer to fully hardware-agnostic FPGA programming, making it a suitable cornerstone for future FPGA compiler research.

[1]  F. Hutter,et al.  πBO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization , 2022, ICLR.

[2]  Luigi Nardi,et al.  Skill-based Multi-objective Reinforcement Learning of Industrial Robot Tasks with Planning and Knowledge Integration , 2022, 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[3]  Cesare Alippi,et al.  A Graph Deep Learning Framework for High-Level Synthesis Design Space Exploration , 2021, ArXiv.

[4]  Luigi Nardi,et al.  LassoBench: A High-Dimensional Hyperparameter Optimization Benchmark Suite for Lasso , 2021, AutoML.

[5]  Deming Chen,et al.  PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow , 2021, IEEE Transactions on Computers.

[6]  Jason Cong,et al.  AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA , 2021, FPGA.

[7]  Yuan Xie,et al.  IRONMAN: GNN-assisted Design Space Exploration in High-Level Synthesis via Reinforcement Learning , 2021, ACM Great Lakes Symposium on VLSI.

[8]  Deming Chen,et al.  ScaleHLS: Scalable High-Level Synthesis through MLIR , 2021, ArXiv.

[9]  Kunle Olukotun,et al.  Bayesian Optimization with a Prior for the Optimum , 2021, ECML/PKDD.

[10]  Eriko Nurvitadhi,et al.  Artisan: a Meta-Programming Approach For Codifying Optimisation Strategies , 2020, 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[11]  Sarita V. Adve,et al.  Exploring Extended Reality with ILLIXR: A New Playground for Architecture Research , 2020, ArXiv.

[12]  Rob A. Rutenbar,et al.  Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL , 2020, FPGA.

[13]  Jason Cong,et al.  HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing , 2019, FPGA.

[14]  Kunle Olukotun,et al.  Practical Design Space Exploration , 2018, 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[15]  Jose M. Alcaraz Calero,et al.  Towards an FPGA-Accelerated programmable data path for edge-to-core communications in 5G networks , 2018, J. Netw. Comput. Appl..

[16]  Hamid Reza Zohouri High Performance Computing with FPGAs and OpenCL , 2018, ArXiv.

[17]  Kunle Olukotun,et al.  Spatial: a language and compiler for application accelerators , 2018, PLDI.

[18]  Sarita V. Adve,et al.  HPVM: heterogeneous parallel virtual machine , 2018, PPoPP.

[19]  Dejan S. Milojicic,et al.  Autotuning high-level synthesis for FPGAs using OpenTuner and LegUp , 2017, 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[20]  Paul H. J. Kelly,et al.  Application-oriented design space exploration for SLAM algorithms , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Paul H. J. Kelly,et al.  Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[22]  Xuan Yang,et al.  Programming Heterogeneous Systems from an Image Processing DSL , 2016, ACM Trans. Archit. Code Optim..

[23]  Qi Yu,et al.  DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[24]  Michael F. P. O'Boyle,et al.  Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[25]  Yun Liang,et al.  Lin-Analyzer: A high-level performance analysis tool for FPGA-based accelerators , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[26]  Gu-Yeon Wei,et al.  MachSuite: Benchmarks for accelerator design and customized architectures , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[27]  An Braeken,et al.  Sensor Systems Based on FPGAs and Their Applications: A Survey , 2012, Sensors.

[28]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[29]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[30]  David A. Padua,et al.  Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.

[31]  Joe D. Warren,et al.  A hierarchical basis for reordering transformations , 1984, POPL '84.