论文信息 - Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

Continued improvements in machine learning techniques offer exciting new opportunities through the use of larger models and larger training datasets. However, there is a growing need to offer these new capabilities on-board low-powered devices such as smartphones, wearables and other embedded environments where only low memory is available. Towards this, we consider methods to reduce the model size of Conformer-based speech recognition models which typically require models with greater than 100M parameters down to just $5$M parameters while minimizing impact on model quality. Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors. We propose model weight reuse at different levels within our model architecture: (i) repeating full conformer block layers, (ii) sharing specific conformer modules across layers, (iii) sharing sub-components per conformer module, and (iv) sharing decomposed sub-component weights after low-rank decomposition. By sharing weights at different levels of our model, we can retain the full model in-memory while increasing the number of virtual transformations applied to the input. Through a series of ablation studies and evaluations, we find that with weight sharing and a low-rank architecture, we can achieve a WER of 2.84 and 2.94 for Librispeech dev-clean and test-clean respectively with a $5$M parameter model.

[1] Yuhao Zhang,et al. An Ultra-low Power TinyML System for Real-time Visual Processing at Edge , 2022, IEEE Transactions on Circuits and Systems II: Express Briefs.

[2] Steven M. Hernandez,et al. WiFi Sensing on the Edge: Signal Processing Techniques and Challenges for Real-World Systems , 2023, IEEE Communications Surveys & Tutorials.

[3] Hongxia Jin,et al. Language model compression with weighted low-rank factorization , 2022, ICLR.

[4] Clemens J. S. Schaefer,et al. Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks , 2022, ArXiv.

[5] Yanzhang He,et al. 4-bit Conformer with Native Quantization Aware Training for Speech Recognition , 2022, INTERSPEECH.

[6] Zhangyang Wang,et al. Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable , 2022, ICLR.

[7] Andreas Stolcke,et al. Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End , 2021, Interspeech.

[8] Vincent Gripon,et al. Quantization and Deployment of Deep Neural Networks on Microcontrollers , 2021, Sensors.

[9] Ding Zhao,et al. Dynamic Sparsity Neural Networks for Automatic Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.

[11] Yonghui Wu,et al. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context , 2020, INTERSPEECH.

[12] David Patterson,et al. Benchmarking TinyML Systems: Challenges and Direction , 2020, ArXiv.

[13] Ming Gong,et al. Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System , 2019, WSDM.

[14] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[15] Alessandro Montanari,et al. Resource Characterisation of Personal-Scale Sensing Models on Edge Accelerators , 2019, Proceedings of the First International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things.

[16] Alessio Brutti,et al. Neural Network Distillation on IoT Platforms for Sound Event Detection , 2019, INTERSPEECH.

[17] Alvarez Raziel,et al. End-to-end Streaming Keyword Spotting , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] Atsushi Fujita,et al. Recurrent Stacking of Layers for Compact Neural Machine Translation Models , 2018, AAAI.

[19] Seyed-Mohsen Moosavi-Dezfooli,et al. Adaptive Quantization for Deep Neural Network , 2017, AAAI.

[20] Dacheng Tao,et al. On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[22] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[23] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.

[24] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[25] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26] Yifan Gong,et al. Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.