Adaptive Noise Reduction for Sound Event Detection Using Subband-Weighted NMF †

Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). First, a scheme for noise dictionary learning from the input noisy signal is employed by the technique of robust NMF, which supports adaptation to noise variations. The estimated noise dictionary is used to develop a supervised source separation framework in combination with a pre-trained event dictionary. Second, to improve the separation quality, we extend the basic NMF model to a weighted form, with the aim of varying the relative importance of the different components when separating a target sound event from noise. With properly designed weights, the separation process is forced to rely more on those dominant event components, whereas the noise gets greatly suppressed. The proposed method is evaluated on a dataset of the rare sound event detection task of the DCASE 2017 challenge, and achieves comparable results to the top-ranking system based on convolutional recurrent neural networks (CRNNs). The proposed weighted NMF method shows an excellent noise reduction ability, and achieves an improvement of an F-score by 5%, compared to the unweighted approach.

[1]  Seungjin Choi,et al.  Weighted nonnegative matrix factorization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  J. Larsen,et al.  Wind Noise Reduction using Non-Negative Sparse Coding , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[3]  Yu-Jin Zhang,et al.  Nonnegative Matrix Factorization: A Comprehensive Review , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  Paul Van Dooren,et al.  Weighted Nonnegative Matrix Factorization and Face Feature Extraction , 2007 .

[5]  Lawrence K. Saul,et al.  Modeling distances in large-scale networks by matrix factorization , 2004, IMC '04.

[6]  Daniel P. W. Ellis,et al.  Speech enhancement by sparse, low-rank, and dictionary spectrogram decomposition , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[7]  Emmanuel Vincent,et al.  Single-channel audio source separation with NMF: divergences, constraints and algorithms , 2018 .

[8]  Guanghui Teng,et al.  A Sound Source Localisation Analytical Method for Monitoring the Abnormal Night Vocalisations of Poultry , 2018, Sensors.

[9]  Bernt Schiele,et al.  Introducing a weighted non-negative matrix factorization for image classification , 2003, Pattern Recognit. Lett..

[10]  Huy Phan,et al.  DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection , 2017, ArXiv.

[11]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[12]  Xiaofei He,et al.  Robust non-negative matrix factorization , 2011 .

[13]  T. Virtanen Monaural Sound Source Separation by Perceptually Weighted Non-Negative Matrix Factorization , 2003 .

[14]  Paris Smaragdis,et al.  Mixtures of Local Dictionaries for Unsupervised Speech Enhancement , 2015, IEEE Signal Processing Letters.

[15]  Kwang Myung Jeon,et al.  FOR DETECTION OF RARE SOUND EVENTS , 2017 .

[16]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[17]  Louis Chevallier,et al.  Temporal annotation-based audio source separation using weighted nonnegative matrix factorization , 2014, 2014 IEEE Fourth International Conference on Consumer Electronics Berlin (ICCE-Berlin).

[18]  Tom J. Moir,et al.  An overview of applications and advancements in automatic sound recognition , 2016, Neurocomputing.

[19]  Dan Stowell,et al.  On-Bird Sound Recordings: Automatic Acoustic Recognition of Activities and Contexts , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Shrikanth Narayanan,et al.  Acoustic Denoising Using Dictionary Learning With Spectral and Temporal Regularization , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Meng Sun,et al.  Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback–Leibler Divergence , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Abbas Jamalipour,et al.  Machine Learning Inspired Sound-Based Amateur Drone Detection for Public Safety Applications , 2019, IEEE Transactions on Vehicular Technology.

[23]  Vittorio Murino,et al.  Audio Surveillance , 2014, ACM Comput. Surv..

[24]  Hirokazu Kameoka,et al.  Nonnegative Matrix Factorization With Basis Clustering Using Cepstral Distance Regularization , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Heikki Huttunen,et al.  Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Jun Zhang,et al.  A Target Guided Subband Filter for Acoustic Event Detection in Noisy Environments Using Wavelet Packets , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Reishi Kondo,et al.  Acoustic Event Detection Method Using Semi-Supervised Non-Negative Matrix Factorization with Mixtures of Local Dictionaries , 2016, DCASE.

[28]  Reishi Kondo,et al.  Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Francesc Alías,et al.  homeSound: Real-Time Audio Event Detection Based on High Performance Computing for Behaviour and Surveillance Remote Monitoring , 2017, Sensors.

[30]  Kyogu Lee,et al.  Rare Sound Event Detection Using 1D Convolutional Recurrent Neural Networks , 2017, DCASE.

[31]  Paris Smaragdis,et al.  Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view , 2014, IEEE Signal Processing Magazine.

[32]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  T. Virtanen,et al.  Convolutional Recurrent Neural Networks for Rare Sound Event Detection , 2017, DCASE.

[34]  Ankit Shah,et al.  DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[35]  Bart Vanrumste,et al.  An exemplar-based NMF approach to audio event detection , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[36]  Zuren Feng,et al.  Robust Sound Event Detection Through Noise Estimation and Source Separation Using NMF , 2017, DCASE.

[37]  Yonggang Hu,et al.  Speech Enhancement Combining NMF Weighted by Speech Presence Probability and Statistical Model , 2015, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[38]  Annamaria Mesaros,et al.  Metrics for Polyphonic Sound Event Detection , 2016 .