Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks

Bayesian Convolutional Neural Networks (BCNNs) have emerged as a robust form of Convolutional Neural Networks (CNNs) with the capability of uncertainty estimation. A BCNN model is implemented by adding a dropout layer after each convolutional layer in the original CNN. By executing the stochastic inferences many times, BCNNs are able to provide an output distribution that reflects the uncertainty of the final prediction. Repeated inferences in this process lead to much longer execution time, which makes it challenging to apply Bayesian technique to CNNs in real-world applications. In this study, we propose Fast-BCNN, an FPGA-based hardware accelerator design that intelligently skips the redundant computations for two types of neurons during repeated BCNN inferences. Firstly, within a sample inference, we aim to skip the dropped neurons that predetermined by dropout masks. Secondly, by leveraging the information from the first inference and dropout masks, we predict the zero neurons and skip all their corresponding computations during the following sample inferences. Particularly, an optimization algorithm is employed to guarantee the accuracy of zero neuron prediction while achieving the maximal computation reduction. To support our neuron skipping strategy at hardware level, we explore an efficient parallelism for CNN convolution to gracefully skip the corresponding computations for both types of neurons, we then propose a novel PE architecture that accommodates the parallel operation of convolution and prediction with negligible overhead. Experimental results demonstrate that our Fast-BCNN achieves 2.1~8.2× speedup and 44%~84% energy reduction over the baseline CNN accelerator.

[1]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[2]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[3]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[4]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[5]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[6]  Saad Rehman,et al.  A deep CNN based multi-class classification of Alzheimer's disease using MRI , 2017, 2017 IEEE International Conference on Imaging Systems and Techniques (IST).

[7]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[8]  Evangelos Theodorou,et al.  Safe end-to-end imitation learning for model predictive control , 2018, ArXiv.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Anne Canteaut Linear Feedback Shift Register , 2005, Encyclopedia of Cryptography and Security.

[11]  Daniela Rus,et al.  Spatial Uncertainty Sampling for End-to-End Control , 2018, ArXiv.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Rajesh K. Gupta,et al.  SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[16]  Massoud Pedram,et al.  VIBNN: Hardware Acceleration of Bayesian Neural Networks , 2018, ASPLOS.

[17]  Jonathan Kelly,et al.  Reducing drift in visual odometry by inferring sun direction using a Bayesian Convolutional Neural Network , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  P. Uday Bhaskar,et al.  A Survey on Implementation of Random Number Generator in FPGA , 2015 .

[19]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Xiaowei Li,et al.  FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[21]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[22]  Shaoli Liu,et al.  Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[24]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[25]  Yann LeCun,et al.  CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[26]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  Fakhreddine Ghaffari,et al.  Hardware Implementation and Performance Analysis of Resource Efficient Probabilistic Hard Decision LDPC Decoders , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[29]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[30]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.