FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 μs latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 μs latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[3]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Luca Benini,et al.  YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[8]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[9]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[12]  Karin Strauss,et al.  Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .

[13]  Paris Smaragdis,et al.  Bitwise Neural Networks , 2016, ArXiv.

[14]  Gopalakrishna Hegde,et al.  CaffePresso: An optimized library for Deep Learning on embedded accelerator-based platforms , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[15]  Kenneth O'Brien,et al.  A Semi-Automated Tool Flow for Roofline Anaylsis of OpenCL Kernels on Accelerators , 2015 .

[16]  Wonyong Sung,et al.  FPGA based implementation of deep neural networks using on-chip memory only , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[19]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[20]  Yann LeCun,et al.  CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[21]  Wonyong Sung,et al.  Resiliency of Deep Neural Networks under Quantization , 2015, ArXiv.

[22]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[25]  Indranil Saha,et al.  journal homepage: www.elsevier.com/locate/neucom , 2022 .

[26]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[27]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[28]  Eriko Nurvitadhi,et al.  Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[29]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[30]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[31]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[32]  Frédéric Pétrot,et al.  Ternary neural networks for resource-efficient AI applications , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[33]  Christos-Savvas Bouganis,et al.  fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).