Shiftry: RNN inference in 2KB of RAM

Traditionally, IoT devices send collected sensor data to an intelligent cloud where machine learning (ML) inference happens. However, this course is rapidly changing and there is a recent trend to run ML on the edge IoT devices themselves. An intelligent edge is attractive because it saves network round trip (efficiency) and keeps user data at the source (privacy). However, the IoT devices are much more resource constrained than the cloud, which makes running ML on them challenging. Specifically, consider Arduino Uno, a commonly used board, that has 2KB of RAM and 32KB of read-only Flash memory. Although recent breakthroughs in ML have created novel recurrent neural network (RNN) models that provide good accuracy with KB-sized models, deploying them on tiny devices with such hard memory requirements has remained elusive. We provide, Shiftry, an automatic compiler from high-level floating-point ML models to fixed-point C-programs with 8-bit and 16-bit integers, which have significantly lower memory requirements. For this conversion, Shiftry uses a data-driven float-to-fixed procedure and a RAM management mechanism. These techniques enable us to provide first empirical evaluation of RNNs running on tiny edge devices. On simpler ML models that prior work could handle, Shiftry-generated code has lower latency and higher accuracy.

[1]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[2]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[3]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[4]  Jeff Johnson,et al.  Rethinking floating point for deep learning , 2018, ArXiv.

[5]  Viktor Kuncak,et al.  Synthesis of fixed-point programs , 2013, 2013 Proceedings of the International Conference on Embedded Software (EMSOFT).

[6]  Meng Chen,et al.  IoT for Next-Generation Racket Sports Training , 2018, IEEE Internet of Things Journal.

[7]  Xi Chen,et al.  FxpNet : Training deep convolutional neural network in fixed-point representation , 2016 .

[8]  Pete Warden,et al.  Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[9]  Hanan Samet,et al.  Training Quantized Nets: A Deeper Understanding , 2017, NIPS.

[10]  Prateek Jain,et al.  GesturePod: Enabling On-device Gesture-based Interaction for White Cane Users , 2019, UIST.

[11]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .

[12]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  François Charot,et al.  Automatic floating-point to fixed-point conversion for DSP code generation , 2002, CASES '02.

[14]  Zeyuan Allen Zhu,et al.  Randomized accuracy-aware program transformations for efficient approximate computations , 2012, POPL '12.

[15]  Viktor Kuncak,et al.  Towards a Compiler for Reals , 2014, ACM Trans. Program. Lang. Syst..

[16]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[17]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Csaba Andras Moritz,et al.  Parallelizing applications into silicon , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[19]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[20]  Prateek Jain,et al.  ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices , 2017, ICML.

[21]  Tie-Yan Liu,et al.  Normalization Helps Training of Quantized LSTM , 2019, NeurIPS.

[22]  Max Welling,et al.  Relaxed Quantization for Discretized Neural Networks , 2018, ICLR.

[23]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[24]  Viktor Kuncak,et al.  Sound compilation of reals , 2013, POPL.

[25]  Shishir G. Patil,et al.  GesturePod : Programmable Gesture Recognition for Augmenting Assistive Devices , 2018 .

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[28]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[29]  Deliang Fan,et al.  Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Saurabh Goyal,et al.  Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things , 2017, ICML.

[31]  James J. Little,et al.  LSQ++: Lower Running Time and Higher Recall in Multi-codebook Quantization , 2018, ECCV.

[32]  James Demmel,et al.  Precimonious: Tuning assistant for floating-point precision , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[33]  Luca Rigazio,et al.  ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks , 2017, ArXiv.

[34]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[35]  Yurong Chen,et al.  Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Shwetak N. Patel,et al.  Heterogeneous Bitwidth Binarization in Convolutional Neural Networks , 2018, NeurIPS.

[37]  Massimo Banzi,et al.  Make: Getting Started with Arduino: The Open Source Electronics Prototyping Platform , 2014 .

[38]  Xianglong Liu,et al.  Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Davide Anguita,et al.  Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine , 2012, IWAAL.

[40]  Billur Barshan,et al.  Comparative study on classifying human activities with miniature inertial and magnetic sensors , 2010, Pattern Recognit..

[41]  Charbel Sakr,et al.  Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm , 2018, ICLR.

[42]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Alok N. Choudhary,et al.  Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[44]  Prateek Jain,et al.  FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network , 2018, NeurIPS.

[45]  Vivek Seshadri,et al.  Compiling KB-sized machine learning models to tiny IoT devices , 2019, PLDI.

[46]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[47]  M. Bečvář,et al.  Fixed-Point Arithmetic in FPGA , 2005 .

[48]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[49]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[50]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[51]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[52]  Sinno Jialin Pan,et al.  MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization , 2019, NeurIPS.

[53]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[54]  Prithviraj Banerjee,et al.  Automatic conversion of floating point MATLAB programs into fixed point FPGA based hardware design , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[55]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[56]  Manik Varma,et al.  RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference , 2020, NeurIPS.

[57]  Alexander Aiken,et al.  Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.

[58]  Xin Wang,et al.  Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.

[59]  Ranveer Chandra,et al.  Fall-curve: A novel primitive for IoT Fault Detection and Isolation , 2018, SenSys.

[60]  Markus Nagel,et al.  Data-Free Quantization Through Weight Equalization and Bias Correction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).