论文信息 - Assisting High-Level Synthesis Improve SpMV Benchmark Through Dynamic Dependence Analysis

Assisting High-Level Synthesis Improve SpMV Benchmark Through Dynamic Dependence Analysis

Recent advances in high-level synthesis (HLS) have enabled an automatic means of generating register-transfer level from high-level specifications without compromising performance. HLS provides substantial improvements to productivity and is a promising solution to designing future heterogeneous chips consisting of dozens of unique IP blocks (i.e., hardware accelerators). Despite their impressive capabilities, HLS tools today are commonly used to target a small subset of workloads, i.e., ones with inordinately regular control flow and memory access patterns. The challenges of achieving high-quality hardware for irregular workloads stems from HLS relying on static analysis. Static analysis is overly conservative when dealing with non-uniform memory access and imbalanced workloads, and identifying the most appropriate parallelizing strategy. In this brief, we propose the use of dynamic analysis to generate higher quality designs using commercial HLS tools. Our evaluations show that with dynamic dependence analysis, HLS designs achieve $3.3\boldsymbol \times $ performance improvement for the sparse matrix-vector multiply benchmark.

[1] Gu-Yeon Wei,et al. Using dynamic dependence analysis to improve the quality of high-level synthesis designs , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[2] Fabrizio Ferrandi,et al. Exploiting vectorization in high level synthesis of nested irregular loops , 2017, J. Syst. Archit..

[3] George A. Constantinides,et al. Polyhedral-Based Dynamic Loop Pipelining for High-Level Synthesis , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4] Jason Cong,et al. Resource-Aware Throughput Optimization for High-Level Synthesis , 2015, FPGA.

[5] Nectarios Koziris,et al. Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[6] Yun Liang,et al. Lin-Analyzer: A high-level performance analysis tool for FPGA-based accelerators , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[7] Steven Derrien,et al. Runtime dependency analysis for loop pipelining in High-Level Synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[8] Phillip H. Jones,et al. A high performance systolic architecture for k-NN classification , 2014, 2014 Twelfth ACM/IEEE Conference on Formal Methods and Models for Codesign (MEMOCODE).

[9] Shreesha Srinath,et al. Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis , 2017, FPGA.

[10] Todd M. Austin,et al. Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.

[11] Jason Cong,et al. PARADE: A cycle-accurate full-system simulation Platform for Accelerator-Rich Architectural Design and Exploration , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[12] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.

[13] Jason Cong,et al. Source-to-Source Optimization for HLS , 2016, FPGAs for Software Programmers.

[14] Zhiru Zhang,et al. ElasticFlow: A complexity-effective approach for pipelining irregular loop nests , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[15] Paolo Ienne,et al. Dynamically Scheduled High-level Synthesis , 2018, FPGA.

[16] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[17] Benjamin Carrion Schafer,et al. Allocation of FPGA DSP-macros in multi-process high-level synthesis systems , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[18] Joseph L. Greathouse,et al. Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.

[20] Ying Wang,et al. Image editing based on Sparse Matrix-Vector multiplication , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).