Deep Neural Network Hardware Implementation Based on Stacked Sparse Autoencoder

Deep learning techniques have been gaining prominence in the research world in the past years; however, the deep learning algorithms have high computational cost, making them hard to be used to several commercial applications. On the other hand, new alternatives have been studied and some methodologies focusing on accelerating complex algorithms including those based on reconfigurable hardware has been showing significant results. Therefore, the objective of this paper is to propose a neural network hardware implementation to be used in deep learning applications. The implementation was developed on a field-programmable gate array (FPGA) and supports deep neural network (DNN) trained with the stacked sparse autoencoder (SSAE) technique. In order to allow DNNs with several inputs and layers on the FPGA, the systolic array technique was used in the entire architecture. Details regarding the designed implementation were evidenced, as well as the hardware area occupation and the processing time for two different implementations. The results showed that both implementations achieved high throughput enabling deep learning techniques to be applied for problems with large data amounts.

[1]  Marcelo A. C. Fernandes,et al.  High-Performance Parallel Implementation of Genetic Algorithm on FPGA , 2019, Circuits Syst. Signal Process..

[2]  Yu Cao,et al.  ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler , 2018, Integr..

[3]  Luís A. Alexandre,et al.  Stacked Autoencoders Using Low-Power Accelerated Architectures for Object Recognition in Autonomous Systems , 2016, Neural Processing Letters.

[4]  Yong Dou,et al.  PERFORMANCE OF THE FIXED-POINT AUTOENCODER , 2016 .

[5]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[6]  Jingfei Jiang,et al.  An FPGA-based accelerator implementation for deep convolutional neural networks , 2015, 2015 4th International Conference on Computer Science and Network Technology (ICCSNT).

[7]  Marcelo A. C. Fernandes,et al.  Parallel Fixed Point Implementation of a Radial Basis Function Network in an FPGA , 2014, 2014 IX Southern Conference on Programmable Logic (SPL).

[8]  Yuki Kobayashi,et al.  A Convolutional Neural Network Fully Implemented on FPGA for Embedded Platforms , 2017, 2017 New Generation of CAS (NGCAS).

[9]  Takashi Morie,et al.  A shared synapse architecture for efficient FPGA implementation of autoencoders , 2018, PloS one.

[10]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[11]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[12]  Youngjae Jin,et al.  Unsupervised Feature Learning by Pre-Route Simulation of Auto-Encoder Behavior Model , 2014 .

[13]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Hai Wang,et al.  An FPGA Implementation of a Convolutional Auto-Encoder , 2018 .

[15]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[16]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[17]  Thang Viet Huynh Deep neural network accelerator based on FPGA , 2017, 2017 4th NAFOSTED Conference on Information and Computer Science.

[18]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[19]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[20]  Philip Heng Wai Leong,et al.  Real-time FPGA-based Anomaly Detection for Radio Frequency Signals , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[21]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[22]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[23]  Marcelo A. C. Fernandes,et al.  Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA , 2019, IEEE Access.