论文信息 - Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification

Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification

Distribution mismatches between the data seen at training and at application time remain a major challenge in all application areas of machine learning. We study this problem in the context of machine listening (Task 1b of the DCASE 2019 Challenge). We propose a novel approach to learn domain-invariant classifiers in an end-to-end fashion by enforcing equal hidden layer representations for domain-parallel samples, i.e. time-aligned recordings from different recording devices. No classification labels are needed for our domain adaptation (DA) method, which makes the data collection process cheaper.

[1] Gerhard Widmer,et al. ACOUSTIC SCENE CLASSIFICATION WITH FULLY CONVOLUTIONAL NEURAL NETWORKS AND I-VECTORS Technical Report , 2018 .

[2] Michael I. Jordan,et al. Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[3] Gerhard Widmer,et al. Deep Within-Class Covariance Analysis for Robust Audio Representation Learning , 2017 .

[4] Dong Xu,et al. Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Paul Primus,et al. Bird Audio Detection-DCASE 2018 , 2018 .

[6] Trevor Darrell,et al. Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[7] Pascal Fua,et al. Beyond Sharing Weights for Deep Domain Adaptation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Paul Primus,et al. ACOUSTIC SCENE CLASSIFICATION WITH MISMATCHED RECORDING DEVICES Technical Report , 2019 .

[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10] Taghi M. Khoshgoftaar,et al. A survey of transfer learning , 2016, Journal of Big Data.

[11] Wouter M. Kouw. An introduction to domain adaptation and transfer learning , 2018, ArXiv.

[12] Jian Shen,et al. Wasserstein Distance Guided Representation Learning for Domain Adaptation , 2017, AAAI.

[13] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[15] Tuomas Virtanen,et al. A multi-device dataset for urban acoustic scene classification , 2018, DCASE.

[16] Gerhard Widmer,et al. The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[17] Kate Saenko,et al. Asymmetric and Category Invariant Feature Transformations for Domain Adaptation , 2014, International Journal of Computer Vision.

[18] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[19] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[20] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).