Design Space exploration of FPGA-based accelerators with multi-level parallelism

Applications containing compute-intensive kernels with nested loops can effectively leverage FPGAs to exploit fine-and coarse-grained parallelism. HLS tools used to translate these kernels from high-level languages (e.g., C/C−−), however, are inefficient in exploiting multiple levels of parallelism automatically, thereby producing sub-optimal accelerators. Moreover, the large design space resulting from the various combinations of fineand coarse-grained parallelism options makes exhaustive design space exploration prohibitively time-consuming with HLS tools. Hence, we propose a rapid estimation framework, MPSeeker, to evaluate performance/area metrics of various accelerator options for an application at an early design phase. Experimental results show that MPSeeker can rapidly (in minutes) explore the complex design space and accurately estimate performance/area of various design points to identify the near-optimal (95.7% performance of the optimal on average) combination of parallelism options.

[1]  Yun Liang,et al.  Design space exploration of multiple loops on FPGAs using high level synthesis , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[2]  Xitong Gao,et al.  Automatically Optimizing the Latency, Area, and Accuracy of C Programs for High-Level Synthesis , 2016, FPGA.

[3]  Yun Liang,et al.  Lin-Analyzer: A high-level performance analysis tool for FPGA-based accelerators , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[4]  Kunle Olukotun,et al.  Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[6]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[7]  Wei Zhang,et al.  A performance analysis framework for optimizing OpenCL applications on FPGAs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[9]  Todd M. Austin,et al.  Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.

[10]  F. H. Mcmahon,et al.  The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[12]  Jason Cong,et al.  Resource-Aware Throughput Optimization for High-Level Synthesis , 2015, FPGA.

[13]  Vlad Mihai Sima,et al.  Quipu: A Statistical Model for Predicting Hardware Resources , 2013, TRETS.

[14]  Jason Cong,et al.  Multilevel Granularity Parallelism Synthesis on FPGAs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.