Over-Parameterization and Generalization in Audio Classification

Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene classification (DCASE) community. In this study, we investigate the relationship between over-parameterization of acoustic scene classification models, and their resulting generalization abilities. Our results indicate that increasing width improves generalization to unseen devices, even without an increase in the number of parameters.

[1]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[3]  Boaz Barak,et al.  Deep double descent: where bigger models and more data hurt , 2019, ICLR.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[7]  Gerhard Widmer,et al.  The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[8]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[9]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[10]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[11]  Behnam Neyshabur,et al.  Are wider nets better given the same number of parameters? , 2021, ICLR.

[12]  Annamaria Mesaros,et al.  Acoustic Scene Classification in DCASE 2020 Challenge: Generalization Across Devices and Low Complexity Solutions , 2020, DCASE.

[13]  Gintare Karolina Dziugaite,et al.  Pruning Neural Networks at Initialization: Why are We Missing the Mark? , 2020, ArXiv.

[14]  Gerhard Widmer,et al.  Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.