Design and Application Space Exploration of a Domain-Specific Accelerator System

Domain-specific accelerators are a reaction adapting to device scaling and the dark silicon era. This paper describes a radar signal processing oriented configurable accelerator and the application space exploration of the system. The system is built around accelerator engines and general-purpose processors (GPPs) that make it suitable for intensive computing kernel acceleration and complex control tasks. It is geared toward high-performance radar digital signal processing; we characterize the applications and find that each of them contains a series of serializable kernels. Taking advantage of this discovery, we design an algorithm pool that shares the same computation resource and memory resource, and each algorithm is size reconfigurable. On the other hand, shared on-chip addressable scratchpad memory eliminates unnecessary explicit data copy between accelerators. Performance of the system is evaluated from measurements performed both on an FPGA SoC test chip and on a prototype chip fabricated by CMOS 40 nm technology. The experimental results show that for different algorithms, the proposed system achieves 1.9× to 10.1× performance gain compared with a state-of-the-art TI DSP chip. In order to characterize the application of the system, a complex real-life task is adopted, and the results show that it can obtain high throughput and desirable precision.

[1]  Ming Yang,et al.  Sonic Millip3De: A massively parallel 3D-stacked accelerator for 3D ultrasound , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[2]  Christoforos E. Kozyrakis,et al.  Convolution engine , 2015, Commun. ACM.

[3]  Tingting He,et al.  Configurable Floating-Point FFT Accelerator on FPGA Based Multiple-Rotation CORDIC , 2016 .

[4]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[5]  Nader Bagherzadeh,et al.  Fast parallel FFT on a reconfigurable computation platform , 2003, Proceedings. 15th Symposium on Computer Architecture and High Performance Computing.

[6]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[7]  Vijayalakshmi Srinivasan,et al.  DASX: Hardware Accelerator for Software Data Structures , 2015, ICS.

[8]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[9]  Fan Feng,et al.  An ultra-long FFT architecture implemented in a reconfigurable application specified processor , 2016, IEICE Electron. Express.

[10]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[11]  Roberto Guerrieri,et al.  Application Space Exploration of a Heterogeneous Run-Time Configurable Digital Signal Processor , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Gu-Yeon Wei,et al.  Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  L. V. Gutierrez,et al.  ASIC Clouds: Specializing the Datacenter , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[14]  Rajiv V. Joshi,et al.  An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[15]  Wei Li,et al.  Application space exploration of a multi-fabric reconfigurable system , 2017, 2017 IEEE 12th International Conference on ASIC (ASICON).

[16]  W.L. Melvin,et al.  A STAP overview , 2004, IEEE Aerospace and Electronic Systems Magazine.

[17]  Gu-Yeon Wei,et al.  Quantifying acceleration: Power/performance trade-offs of application kernels in hardware , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[18]  David Brooks,et al.  Methods and infrastructure in the era of accelerator-centric architectures , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[19]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[20]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[21]  Claudio Alberti,et al.  Porting an MPEG-HEVC decoder to a low-power many-core platform , 2013, ASILOMAR 2013.

[22]  David A. Wood,et al.  LogCA: A high-level performance model for hardware accelerators , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[23]  Jun Lin,et al.  Design and implementation of high performance matrix inversion based on reconfigurable processor , 2016, IEICE Electron. Express.

[24]  Fan Feng,et al.  Floating-point operation based reconfigurable architecture for radar processing , 2016, IEICE Electron. Express.

[25]  Karthikeyan Sankaralingam,et al.  Exploring the potential of heterogeneous Von Neumann/dataflow execution models , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[26]  Michael Parker,et al.  Floating point STAP implementation on FPGAs , 2011, 2011 IEEE RadarCon (RADAR).