FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review

Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged and has demonstrated its ability and effectiveness in solving complex learning problems not possible before. In particular, convolutional neural networks (CNNs) have demonstrated their effectiveness in the image detection and recognition applications. However, they require intensive CPU operations and memory bandwidth that make general CPUs fail to achieve the desired performance levels. Consequently, hardware accelerators that use application-specific integrated circuits, field-programmable gate arrays (FPGAs), and graphic processing units have been employed to improve the throughput of CNNs. More precisely, FPGAs have been recently adopted for accelerating the implementation of deep learning networks due to their ability to maximize parallelism and their energy efficiency. In this paper, we review the recent existing techniques for accelerating deep learning networks on FPGAs. We highlight the key features employed by the various techniques for improving the acceleration performance. In addition, we provide recommendations for enhancing the utilization of FPGAs for CNNs acceleration. The techniques investigated in this paper represent the recent trends in the FPGA-based accelerators of deep learning networks. Thus, this paper is expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers.

[1]  Quan Chen,et al.  DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[2]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[3]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[4]  S. Winograd Arithmetic complexity of computations , 1980 .

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[8]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[10]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[11]  Yu Cao,et al.  Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.

[12]  Vivienne Sze,et al.  Hardware for machine learning: Challenges and opportunities , 2017, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[13]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[14]  Yun Liang,et al.  High-Level Synthesis: Productivity, Performance, and Software Constraints , 2012, J. Electr. Comput. Eng..

[15]  Srihari Cadambi,et al.  A programmable parallel accelerator for learning and classification , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Viktor Prasanna,et al.  Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System , 2017, FPGA.

[17]  Mohamed S. Abdelfattah,et al.  DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[18]  Shijie Li,et al.  Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks , 2017, ACM Trans. Reconfigurable Technol. Syst..

[19]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[20]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[21]  Yanjun Qi,et al.  Learning to rank with (a lot of) word features , 2010, Information Retrieval.

[22]  Karin Strauss,et al.  Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .

[23]  Srihari Cadambi,et al.  A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[24]  Urs A. Muller,et al.  A multi-range vision strategy for autonomous offroad navigation , 2007 .

[25]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Jason Cong,et al.  Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[29]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[30]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[31]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Paris Smaragdis,et al.  Bitwise Neural Networks , 2016, ArXiv.

[33]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Christos-Savvas Bouganis,et al.  Latency-driven design for FPGA-based convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[35]  Henk Corporaal,et al.  Memory-centric accelerator design for Convolutional Neural Networks , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[36]  Christos-Savvas Bouganis,et al.  fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[37]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Michele Magno,et al.  Accelerating real-time embedded scene labeling with convolutional networks , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[40]  F. Cardells-Tormo,et al.  Area-efficient 2D shift-variant convolvers for FPGA-based digital image processing , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[41]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[42]  John Freeman,et al.  OpenCL for FPGAs: Prototyping a Compiler , 2013 .

[43]  Stacy Holman Jones Torch , 1999 .

[44]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[45]  Yann LeCun,et al.  An FPGA-based stream processor for embedded real-time vision with Convolutional Networks , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[46]  Beatriz Blanco-Filgueira,et al.  Deep Learning-Based Multiple Object Visual Tracking on Embedded System for IoT and Mobile Edge Computing Applications , 2018, IEEE Internet of Things Journal.

[47]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[48]  Xiaowei Li,et al.  FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[49]  Trishul M. Chilimbi,et al.  Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[50]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[52]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[53]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[54]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[55]  Rafael Gadea Gironés,et al.  FPGA Implementation of a Pipelined On-Line Backpropagation , 2005, J. VLSI Signal Process..

[56]  Shawki Areibi,et al.  The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study , 2007, IEEE Transactions on Neural Networks.

[57]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[58]  Berin Martini,et al.  Large-Scale FPGA-based Convolutional Networks , 2011 .

[59]  Mohamed S. Abdelfattah,et al.  Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL , 2014, IWOCL '14.

[60]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[61]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[62]  Yann LeCun,et al.  Comparing SVM and convolutional networks for epileptic seizure prediction from intracranial EEG , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[63]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[64]  Xi Chen,et al.  FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[65]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[66]  Indranil Saha,et al.  journal homepage: www.elsevier.com/locate/neucom , 2022 .

[67]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[68]  Yu Cao,et al.  Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[69]  M H Alsuwaiyel Algorithms: Design Techniques and Analysis (Revised Edition) , 2016 .

[70]  Qi Yu,et al.  DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[71]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[72]  Luis Herranz,et al.  Heterogeneous Convolutional Neural Networks for Visual Recognition , 2016, PCM.

[73]  Yann LeCun,et al.  Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.

[74]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[75]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[76]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[77]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[78]  Berin Martini,et al.  A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[79]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[80]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[81]  Eriko Nurvitadhi,et al.  Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.

[82]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[83]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[84]  Douglas L. Maskell,et al.  Efficient Overlay Architecture Based on DSP Blocks , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[85]  Yann LeCun,et al.  CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[86]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[87]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[88]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[89]  Srihari Cadambi,et al.  A Massively Parallel Digital Learning Processor , 2008, NIPS.

[90]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[91]  Hui Zhang,et al.  A Multiwindow Partial Buffering Scheme for FPGA-Based 2-D Convolvers , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[92]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[93]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[94]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[95]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[96]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[97]  Paulo J. G. Lisboa,et al.  Artificial Neural Networks in Biomedicine , 2000, Perspectives in Neural Computing.

[98]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[99]  Ronald Davis,et al.  Neural networks and deep learning , 2017 .

[100]  Shawki Areibi,et al.  Deep Learning on FPGAs: Past, Present, and Future , 2016, ArXiv.

[101]  Stefan Wermter,et al.  A Multichannel Convolutional Neural Network for Hand Posture Recognition , 2014, ICANN.

[102]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[103]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  Jason Cong,et al.  Improving high level synthesis optimization opportunity through polyhedral transformations , 2013, FPGA '13.

[105]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[106]  Wim Vanderbauwhede,et al.  High-Performance Computing Using FPGAs , 2013 .

[107]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[108]  Sadiq M. Sait,et al.  Iterative computer algorithms with applications in engineering - solving combinatorial optimization problems , 2000 .

[109]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[110]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[111]  P. L. Montgomery,et al.  A survey of modern integer factorization algorithms , 1994 .

[112]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[113]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[114]  Steven Pigeon,et al.  VIP: an FPGA-based processor for image processing and neural networks , 1996, Proceedings of Fifth International Conference on Microelectronics for Neural Networks.

[115]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[116]  Hadi Esmaeilzadeh,et al.  Neural acceleration for GPU throughput processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[117]  Khaled Benkrid,et al.  Design and implementation of a 2D convolution core for video applications on FPGAs , 2002, Third International Workshop on Digital and Computational Video, 2002. DCV 2002. Proceedings..

[118]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[119]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[120]  Denis F. Wolf,et al.  USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS , 2001 .

[121]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[122]  Jie Xu,et al.  DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[123]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[124]  Jing Li,et al.  Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.

[125]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[126]  Judith E. Dayhoff,et al.  Neural Network Architectures: An Introduction , 1989 .

[127]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[128]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[129]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[130]  Takashi Morie,et al.  Projection-Field-Type VLSI Convolutional Neural Networks Using Merged/Mixed Analog-Digital Approach , 2007, ICONIP.

[131]  Alan L. Yuille,et al.  Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[132]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[133]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[134]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[135]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136]  Jason Cong,et al.  Polyhedral-based data reuse optimization for configurable computing , 2013, FPGA '13.

[137]  L. Bottou,et al.  Deep Convolutional Networks for Scene Parsing , 2009 .

[138]  Tianshi Chen,et al.  DaDianNao: A Neural Network Supercomputer , 2017, IEEE Transactions on Computers.

[139]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[140]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[141]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[142]  Gernot A. Fink,et al.  Face Detection Using GPU-Based Convolutional Neural Networks , 2009, CAIP.

[143]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[144]  Andrew C. Ling,et al.  An OpenCL(TM) Deep Learning Accelerator on Arria 10 , 2017 .

[145]  Yu Cao,et al.  An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[146]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[147]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[148]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[149]  Martin C. Herbordt,et al.  Computing Models for FPGA-Based Accelerators , 2008, Computing in Science & Engineering.

[150]  Jagath C. Rajapakse,et al.  FPGA Implementations of Neural Networks , 2006 .

[151]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[152]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[153]  Andrew C. Ling,et al.  An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.

[154]  David Gregg,et al.  Parallel Multi Channel convolution using General Matrix Multiplication , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[155]  Srihari Cadambi,et al.  A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[156]  Francisco Cardells-Tormo,et al.  Area-efficient 2-D shift-variant convolvers for FPGA-based digital image processing , 2005, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005..

[157]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[158]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[159]  T. Morie,et al.  An image filtering processor for face/object recognition using merged/mixed analog-digital architecture , 2005, Digest of Technical Papers. 2005 Symposium on VLSI Circuits, 2005..

[160]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[161]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[162]  Yu Cao,et al.  ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler , 2018, Integr..

[163]  M. Balakrishnan,et al.  Architecture Exploration of FPGA Based Accelerators for BioInformatics Applications , 2016 .

[164]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[165]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[166]  C. Reeves Modern heuristic techniques for combinatorial problems , 1993 .

[167]  Yu Cao,et al.  Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[168]  Michael Ferdman,et al.  Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[169]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[170]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[171]  Mohamad Ivan Fanany,et al.  Metaheuristic Algorithms for Convolution Neural Network , 2016, Comput. Intell. Neurosci..

[172]  Yu Wang,et al.  Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[173]  Jason Cong,et al.  Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster , 2016, ISLPED.

[174]  Berin Martini,et al.  Hardware accelerated convolutional neural networks for synthetic vision systems , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[175]  Yu Cao,et al.  End-to-end scalable FPGA accelerator for deep residual networks , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[176]  Masanori Hariyama,et al.  Design of FPGA-Based Computing Systems with OpenCL , 2017 .

[177]  Yvon Savaria,et al.  Reconfigurable pipelined 2-D convolvers for fast digital signal processing , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[178]  J. Dixon Asymptotically fast factorization of integers , 1981 .

[179]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[180]  Mark S. Rzepczynski Neural Networks in Finance: Gaining Predictive Edge in the Markets (a review) , 2007 .

[181]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[182]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[183]  Yann LeCun,et al.  A multirange architecture for collision‐free off‐road robot navigation , 2009, J. Field Robotics.

[184]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[185]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[186]  Peng Zhang,et al.  Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[187]  Wonyong Sung,et al.  Resiliency of Deep Neural Networks under Quantization , 2015, ArXiv.

[188]  Srihari Cadambi,et al.  A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[189]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[190]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[191]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[192]  Yen-Cheng Kuan,et al.  A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[193]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).