论文信息 - AutoRARE: An Automated Tool For Generating FPGA-Based Multi-Memory Hardware Accelerators For Compute-Intensive Applications

AutoRARE: An Automated Tool For Generating FPGA-Based Multi-Memory Hardware Accelerators For Compute-Intensive Applications

In this paper, we present AutoRARE, a Java-based automated design tool for generating Field Programmable Gate Array (FPGA)-based hardware accelerators. AutoRARE automatically generates all VHDL models needed to build/synthesize a processor specifically tailored for each application. The user needs only provide the VHDL description of a special-purpose floating point Arithmetic Logic Unit (ALU) or function core. The tool generates the VHDL description for the memory interface, memory controller, host processor interface, and the application specific processor. We also present details of the FPGA-based multi-memory hardware accelerator for accelerating computationally intensive applications, generated using AutoRARE. The multi-memory hardware accelerator is highly pipelined and able to simultaneously read and write multiple floating point values from multiple memories. The multi-memory architecture is the key to providing hardware accelerators that execute 10X-100X faster than typical multi-core processors. The Taylor Series expansion of the sine/cosine function is used as an application to demonstrate the merits of the multi-memory hardware accelerator. In our experiments, we executed the Taylor Series in software and compared execution times with an FPGA-based hardware implementation. Our experiments show that the FPGA-based multi-memory Taylor Series hardware accelerator is 481X faster than software executing the Taylor Series on a typical server.

Youngsoo Kim | Shrikant S. Jadhav | Clay Gloster | Jannatun Naher | Christopher Doss

[1] Shrikant S. Jadhav,et al. An automated Reconfigurable-Computing Environment for accelerating software applications , 2017, SoutheastCon 2017.

[2] Junbin Gao,et al. Classification on Stiefel and Grassmann manifolds via maximum Likelihood estimation of matrix distributions , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[3] Gustavo Alonso,et al. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[4] Eduard Ayguadé,et al. Advanced Pattern based Memory Controller for FPGA based HPC applications , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[5] James C. Hoe,et al. CoRAM: an in-fabric memory architecture for FPGA-based computing , 2011, FPGA '11.

[6] Simon W. Moore,et al. Managing the FPGA memory wall: Custom computing or vector processing? , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[7] Kermin Fleming,et al. LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[8] Lei Jiang,et al. A pipelined market data processing architecture to overcome financial data dependency , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[9] Ed F. Deprettere,et al. Efficient External Memory Interface for Multi-Processor Platforms Realized on FPGA Chips , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[10] Jeon Gue Park,et al. Deep neural network using trainable activation functions , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[11] Darshika G. Perera,et al. An efficient FPGA-based memory architecture for compute-intensive applications on embedded devices , 2017, 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM).

[12] Youngsoo Kim,et al. Dataflow to Hardware Synthesis Framework on FPGAs , 2016, 2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW).

[13] Yann LeCun,et al. CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[14] Wayne Luk,et al. A framework for FPGA acceleration of large graph problems: Graphlet counting case study , 2011, 2011 International Conference on Field-Programmable Technology.