Robust Acoustic Scene Classification to Multiple Devices Using Maximum Classifier Discrepancy and Knowledge Distillation

This paper proposes robust acoustic scene classification (ASC) to multiple devices using maximum classifier discrepancy (MCD) and knowledge distillation (KD). The proposed method employs domain adaptation to train multiple ASC models dedicated to each device and combines these multiple device-specific models using a KD technique into a multi-domain ASC model. For domain adaptation, the proposed method utilizes MCD to align class distributions that conventional DA for ASC methods have ignored. The multi-device robust ASC model is obtained by KD, combining the multiple device-specific ASC models by MCD that may have a lower performance for non-target devices. Our experiments show that the proposed MCD-based device-specific model improved ASC accuracy by at most 12.22% for target samples, and the proposed KD-based device-general model improved ASC accuracy by 2.13% on average for all devices.

[1]  Hye-jin Shim,et al.  Distilling the Knowledge of Specialist Deep Neural Networks in Acoustic Scene Classification , 2019 .

[2]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3]  Bolun Wang,et al.  CIAIC-ASC SYSTEM FOR DCASE 2019 CHALLENGE TASK1 Technical Report , 2019 .

[4]  Lukás Burget,et al.  Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge , 2018, ArXiv.

[5]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[6]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[8]  Tatsuya Harada,et al.  Maximum Classifier Discrepancy for Unsupervised Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Yuxing Peng,et al.  An Adversarial Feature Distillation Method for Audio Classification , 2019, IEEE Access.

[10]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Kyogu Lee,et al.  Convolutional Neural Networks with Binaural Representations and Background Subtraction for Acoustic Scene Classification , 2017, DCASE.

[13]  Hanseok Ko,et al.  Deep Neural Network based learning and transferring mid-level audio features for acoustic scene classification , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[15]  Tuomas Virtanen,et al.  A multi-device dataset for urban acoustic scene classification , 2018, DCASE.

[16]  R. Radhakrishnan,et al.  Audio analysis for surveillance applications , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..