Room Adaptive Conditioning Method for Sound Event Classification in Reverberant Environments

Ensuring performance robustness for a variety of situations that can occur in real-world environments is one of the challenging tasks in sound event classification. One of the unpredictable and detrimental factors in performance, especially in indoor environments, is reverberation. To alleviate this problem, we propose a conditioning method that provides room impulse response (RIR) information to help the network become less sensitive to environmental information and focus on classifying the desired sound. Experimental results show that the proposed method successfully reduced performance degradation caused by the reverberation of the room. In particular, our proposed method works even with similar RIR that can be inferred from the room type rather than the exact one, which has the advantage of potentially being used in real-world applications.

[1]  R. H. Wiley,et al.  Reverberations and Amplitude Fluctuations in the Propagation of Sound in a Forest: Implications for Animal Communication , 1980, The American Naturalist.

[2]  Patrick A. Naylor,et al.  EVALUATION OF SPEECH DEREVERBERATION ALGORITHMS USING THE MARDY DATABASE , 2006 .

[3]  Justin Salamon,et al.  Sensing Urban Soundscapes , 2014, EDBT/ICDT Workshops.

[4]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[5]  Chng Eng Siong,et al.  Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Sacha Krstulović,et al.  Audio Event Recognition in the Smart Home , 2018 .

[7]  Yoshua Bengio,et al.  Feature-wise transformations , 2018, Distill.

[8]  Yan Song,et al.  Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[10]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[11]  Nicolai Petkov,et al.  Audio Surveillance of Roads: A System for Detecting Anomalous Sounds , 2016, IEEE Transactions on Intelligent Transportation Systems.

[12]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[13]  Nicolai Petkov,et al.  Reliable detection of audio events in highly noisy environments , 2015, Pattern Recognit. Lett..

[14]  Tomohiro Nakatani,et al.  Neural Network-Based Spectrum Estimation for Online WPE Dereverberation , 2017, INTERSPEECH.

[15]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[16]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Ke Wang,et al.  Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition , 2018, INTERSPEECH.

[18]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[19]  Ilyas Ozer,et al.  Noise robust sound event classification with convolutional neural network , 2018, Neurocomputing.

[20]  Mark B. Sandler,et al.  Database of omnidirectional and B-format room impulse responses , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Yoshua Bengio,et al.  Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition , 2017, INTERSPEECH.