daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices

It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations. Nevertheless, there has not been open-source implementation in support of this idea on low-end ARM devices (e.g., mobile phones and embedded devices). In this work, we propose daBNN --- a super fast inference framework that implements BNNs on ARM devices. Several speed-up and memory refinement strategies for bit-packing, binarized convolution, and memory layout are uniquely devised to enhance inference efficiency. Compared to the recent open-source BNN inference framework, BMXNet, our daBNN is 7x~23x faster on a single binary convolution, and about 6x faster on Bi-Real Net 18 (a BNN variant of ResNet-18). The daBNN is a BSD-licensed inference framework, and its source code, sample projects and pre-trained models are available on-line: https://github.com/JDAI-CV/dabnn.

[1]  Christoph Meinel,et al.  BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet , 2017, ACM Multimedia.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jing Hu,et al.  BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs , 2018, ACM Multimedia.

[4]  Amit Agarwal,et al.  CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.

[5]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[6]  Charles X. Ling,et al.  Pelee: A Real-Time Object Detection System on Mobile Devices , 2018, NeurIPS.

[7]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[8]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[9]  Xuanzhe Liu,et al.  A First Look at Deep Learning Apps on Smartphones , 2018, WWW.

[10]  Jiangming Jin,et al.  BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[12]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .