FastICARL: Fast Incremental Classifier and Representation Learning with Efficient Budget Allocation in Audio Sensing Applications

Various incremental learning (IL) approaches have been proposed to help deep learning models learn new tasks/classes continuously without forgetting what was learned previously (i.e., avoid catastrophic forgetting). With the growing number of deployed audio sensing applications that need to dynamically incorporate new tasks and changing input distribution from users, the ability of IL on-device becomes essential for both efficiency and user privacy. However, prior works suffer from high computational costs and storage demands which hinders the deployment of IL ondevice. In this work, to overcome these limitations, we develop an end-to-end and on-device IL framework, FastICARL, that incorporates an exemplar-based IL and quantization in the context of audio-based applications. We first employ k-nearestneighbor to reduce the latency of IL. Then, we jointly utilize a quantization technique to decrease the storage requirements of IL. We implement FastICARL on two types of mobile devices and demonstrate that FastICARL remarkably decreases the IL time up to 78-92% and the storage requirements by 2-4 times without sacrificing its performance. FastICARL enables complete on-device IL, ensuring user privacy as the user data does not need to leave the device.

[1]  Nicholas D. Lane,et al.  Unsupervised Domain Adaptation Under Label Space Mismatch for Speech Classification , 2020, INTERSPEECH.

[2]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[3]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[4]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[5]  J. Russell,et al.  Independence and bipolarity in the structure of current affect. , 1998 .

[6]  Brian Kingsbury,et al.  Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition , 2020, INTERSPEECH.

[7]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[8]  Juhyun Lee,et al.  On-Device Neural Net Inference with Mobile GPUs , 2019, ArXiv.

[9]  Benedikt Pfülb,et al.  A comprehensive, application-oriented study of catastrophic forgetting in DNNs , 2019, ICLR.

[10]  S. Sagar Imambi,et al.  PyTorch , 2021, Programming with TensorFlow.

[11]  Mark J. T. Smith,et al.  A new filter bank theory for time-frequency representation , 1987, IEEE Trans. Acoust. Speech Signal Process..

[12]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[13]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[15]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[16]  Cecilia Mascolo,et al.  DSP.Ear: leveraging co-processor support for continuous audio sensing on smartphones , 2014, SenSys.

[17]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Cecilia Mascolo,et al.  Knowing when we do not know: Bayesian continual learning for sensing-based analysis tasks , 2021, ArXiv.

[19]  Justin Salamon,et al.  A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[20]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[21]  Jingyu Wang,et al.  Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion , 2019, Sensors.

[22]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[23]  Maximilian Lam,et al.  Quantized Reinforcement Learning (QUARL) , 2019, ArXiv.

[24]  Yafeng Yang,et al.  MNN: A Universal and Efficient Inference Engine , 2020, MLSys.

[25]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Tara N. Sainath,et al.  Compression of End-to-End Models , 2018, INTERSPEECH.

[28]  Nicholas D. Lane,et al.  DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning , 2015, UbiComp.

[29]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[30]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[31]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[32]  D LaneNicholas,et al.  Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations , 2017 .

[33]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.