论文信息 - A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification

We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC). Specifically, we tackle the ASC task in a low-resource environment leveraging a recently proposed advanced neural network pruning mechanism, namely Lottery Ticket Hypothesis (LTH), to find a sub-network neural model associated with a small amount non-zero model parameters. The effectiveness of LTH for low-complexity acoustic modeling is assessed by investigating various data augmentation and compression schemes, and we report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery. Acoustic Lottery could compress an ASC model up to 1/10 and attain a superior performance (validation accuracy of 74.01% and Log loss of 0.76) compared to its not compressed seed model. All results reported in this work are based on a joint effort of four groups, namely GT-USTC-UKE-Tencent, aiming to address the “Low-Complexity Acoustic Scene Classification (ASC) with Multiple Devices” in the DCASE 2021 Challenge Task 1a.

[1] Chin-Hui Lee,et al. Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Yonghong Yan,et al. Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling , 2019, ArXiv.

[3] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[4] Yuexian Zou,et al. Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification , 2021, Interspeech 2021.

[5] Chin-Hui Lee,et al. An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances , 2020, INTERSPEECH.

[6] Dima Damen,et al. Slow-Fast Auditory Streams for Audio Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Ankit Shah,et al. DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[8] Chin-Hui Lee,et al. Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation , 2020, ArXiv.

[9] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[11] Tuomas Virtanen,et al. A multi-device dataset for urban acoustic scene classification , 2018, DCASE.

[12] Zhongqin Wu,et al. Multi-Scale Temporal Convolution Network for Classroom Voice Detection , 2021, ArXiv.

[13] Annamaria Mesaros,et al. Acoustic Scene Classification in DCASE 2020 Challenge: Generalization Across Devices and Low Complexity Solutions , 2020, DCASE.

[14] Franz Pernkopf,et al. Acoustic Scene Classification for Mismatched Recording Devices Using Heated-Up Softmax and Spectrum Correction , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Chin-Hui Lee,et al. A Two-Stage Approach to Device-Robust Acoustic Scene Classification , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Tuomas Virtanen,et al. Low-Complexity Acoustic Scene Classification for Multi-Device Audio: Analysis of DCASE 2021 Challenge Systems , 2021, DCASE.

[17] CNN-Based Acoustic Scene Classification System , 2021 .

[18] Gilad Yehudai,et al. Proving the Lottery Ticket Hypothesis: Pruning is All You Need , 2020, ICML.

[19] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[20] Yifan Gong,et al. Learning small-size DNN with output-distribution-based criteria , 2014, INTERSPEECH.

[21] D. Filimonov,et al. Multi-Task Language Modeling for Improving Speech Recognition of Rare Words , 2020, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[22] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[23] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[24] Mathieu Lagrange,et al. Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25] Hye-jin Shim,et al. Attentive Max Feature Map for Acoustic Scene Classification with Joint Learning considering the Abstraction of Classes , 2021, ArXiv.

[26] Chin-Hui Lee,et al. Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification , 2020, INTERSPEECH.

[27] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.