Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench

Field-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The emerging high-level synthesis tools such as the Intel OpenCL SDK for FPGA highlight a streamlined design flow to facilitate the use of FPGAs in scientific computing. Investigating the characteristics of supercomputing applications, such as nuclear reactor simulation, with the emerging HLS development flow is important for researchers to evaluate and adopt FPGA-based heterogeneous programming models in research facilities and laboratories. In this paper, we evaluate the OpenCL-based FPGA design of a nuclear reactor simulation application RSBench. We describe the OpenCL implementations and optimization methods on an Intel Arria10-based FPGA platform. Compared with the naïve OpenCL kernel, the optimizations of the kernel increase the performance by a factor of 295 on the FPGA. Compared with an Intel Xeon 16-core CPU and an Nvidia K80 GPU, the performance per watt on the FPGA is 3.59 X better than the CPU and 5.8X lower than the GPU.

[1]  Shanjie Xiao Hardware accelerated high performance neutron transport computation based on AGENT methodology , 2009 .

[2]  John Freeman,et al.  From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[3]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[4]  Wei Zhang,et al.  A study of data partitioning on OpenCL-based FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[5]  Satoshi Matsuoka,et al.  Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[7]  Allen D. Malony,et al.  A Performance Analysis of SIMD Algorithms for Monte Carlo Simulations of Nuclear Reactor Cores , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[8]  Tomasz S. Czajkowski,et al.  Harnessing the power of FPGAs using altera's OpenCL compiler , 2013, FPGA '13.

[9]  Doris Chen,et al.  Fractal video compression in OpenCL: An evaluation of CPUs, GPUs, and FPGAs as acceleration platforms , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[10]  Volker Lindenstruth,et al.  An FPGA-based High-Speed, Low-Latency Processing System for High-Energy Physics , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[11]  Ruppa K. Thulasiram,et al.  Option Pricing on the GPU , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[12]  Huiyang Zhou,et al.  Tuning Stencil codes in OpenCL for FPGAs , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[13]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[14]  Tatjana Jevremovic,et al.  FPGA hardware acceleration for high performance neutron transport computation based on AGENT methodology , 2010 .

[15]  Jim Jeffers,et al.  Knights Landing overview , 2016 .

[16]  Jing Li,et al.  Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.

[17]  Avinash Sodani,et al.  Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .

[18]  Russell Tessier,et al.  FPGA Architecture: Survey and Challenges , 2008, Found. Trends Electron. Des. Autom..

[19]  Tatjana Jevremovic,et al.  High performance reconfigurable hardware acceleration on neutron transport computation based on agent methodology , 2010 .

[20]  Franck Cappello,et al.  Evaluating irregular memory access on OpenCL FPGA platforms: A case study with XSBench , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[21]  Sean Rul,et al.  An experimental study on performance portability of OpenCL kernels , 2010, HiPC 2010.

[22]  George A. Constantinides,et al.  A Case for Work-stealing on FPGAs with OpenCL Atomics , 2016, FPGA.

[23]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[24]  Wu-chun Feng,et al.  Accelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfs , 2016 .

[25]  Deming Chen,et al.  Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling , 2017, FPGA.

[26]  Mohamed S. Abdelfattah,et al.  Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL , 2014, IWOCL '14.

[27]  Wayne Luk,et al.  A Heterogeneous Computing Framework for Computational Finance , 2013, 2013 42nd International Conference on Parallel Processing.

[28]  Jeremy Chritz,et al.  Characterization of OpenCL on a scalable FPGA architecture , 2014, 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14).

[29]  Sungdae Cho,et al.  Design and Performance Evaluation of Image Processing Algorithms on GPUs , 2011, IEEE Transactions on Parallel and Distributed Systems.

[30]  Sean O. Settle High-performance Dynamic Programming on FPGAs with OpenCL , 2013 .

[31]  Jungwon Kim,et al.  OpenACC to FPGA: A Framework for Directive-Based High-Performance Reconfigurable Computing , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[32]  Wayne Luk,et al.  Is high level synthesis ready for business? A computational finance case study , 2014, 2014 International Conference on Field-Programmable Technology (FPT).

[33]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[34]  Benoit Forget,et al.  Direct Doppler broadening in Monte Carlo simulations using the multipole representation , 2014 .

[35]  Pierre-Henri Horrein,et al.  Energy-efficient FPGA implementation for binomial option pricing using OpenCL , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[36]  Kunle Olukotun,et al.  Hardware acceleration of database operations , 2014, FPGA.

[37]  Yu Ting Chen,et al.  A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[38]  Kan Wang,et al.  Research on acceleration method of reactor physics based on FPGA platforms , 2013 .

[39]  Dirk Koch,et al.  FPGAs for Software Programmers , 2016 .

[40]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[41]  Vincent Gramoli,et al.  More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms , 2015, PPoPP.

[42]  Franck Cappello,et al.  Evaluation of a Floating-Point Intensive Kernel on FPGA - A Case Study of Geodesic Distance Kernel , 2017, Euro-Par Workshops.