Evaluating Quantized Convolutional Neural Networks for Embedded Systems
暂无分享,去创建一个
This paper presents a deep learning approach which evaluates accuracy and inference time speedups in
deep convolutional neural networks under various network quantizations. Quantized networks can result in
much faster inference time allowing them to be deployed in real time on an embedded system such as a robot.
We evaluate networks with activations quantized to 1, 2, 4, and 8-bits and binary weights. We found that
network quantization can yield a significant speedup for a small drop in classification accuracy. Specifically,
modifying one of our networks to use an 8-bit quantized input layer and 2-bit activations in hidden layers,
we calculate a theoretical 9.9x speedup in exchange for an F1 score decrease of just 3.4% relative to a full
precision implementation. Higher speedups are obtainable by designing a network architecture containing a
smaller proportion of the total multiplications within the input layer.