Anomalous Sound Event Detection Based on WaveNet

This paper proposes a new method of anomalous sound event detection for use in public spaces. The proposed method utilizes WaveNet, a generative model based on a convolutional neural network, to model in the time domain the various acoustic patterns which occur in public spaces. When the model detects unknown acoustic patterns, they are identified as anomalous sound events. WaveNet has been used to precisely model a waveform signal and to directly generate it using random sampling in generation tasks, such as speech synthesis. On the other hand, our proposed method uses WaveNet as a predictor rather than a generator to detect waveform segments causing large prediction errors as unknown acoustic patterns. Because WaveNet is capable of modeling detailed temporal structures, such as phase information, of the waveform signals, the proposed method is expected to detect anomalous sound events more accurately than conventional methods based on reconstruction errors of acoustic features. To evaluate the performance of the proposed method, we conduct an experimental evaluation using a real-world dataset recorded in a subway station. We compare the proposed method with the conventional feature-based methods such as an auto-encoder and a long short-term memory network. Experimental results demonstrate that the proposed method outperforms the conventional methods and that the prediction errors of WaveNet can be effectively used as a good metric for unsupervised anomalous detection.

[1]  Luis Miguel Bergasa,et al.  Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls , 2015, Expert Syst. Appl..

[2]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[3]  Yongwha Chung,et al.  Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance Systems , 2013, Sensors.

[4]  Ramakant Nevatia,et al.  Hierarchical abnormal event detection by real time and semi-real time multi-tasking video surveillance system , 2013, Machine Vision and Applications.

[5]  Takafumi Koshinaka,et al.  Anomaly detection of motors with feature emphasis using only normal sounds , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Gerhard Widmer,et al.  CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS , 2016 .

[7]  Tomoki Toda,et al.  Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.

[8]  Nigel Collier,et al.  Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation , 2012, Neural Networks.

[9]  Reishi Kondo,et al.  Acoustic Event Detection Method Using Semi-Supervised Non-Negative Matrix Factorization with Mixtures of Local Dictionaries , 2016, DCASE.

[10]  Takehisa Yairi,et al.  Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction , 2014, MLSDA'14.

[11]  J. Ma,et al.  Time-series novelty detection using one-class support vector machines , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[12]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[13]  D. W. Scott Outlier Detection and Clustering by Partial Mixture Modeling , 2004 .

[14]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[17]  Annamaria Mesaros,et al.  Metrics for Polyphonic Sound Event Detection , 2016 .

[18]  Kyogu Lee,et al.  Convolutional Neural Networks with Binaural Representations and Background Subtraction for Acoustic Scene Classification , 2017, DCASE.

[19]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[20]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[21]  Navdeep Jaitly,et al.  Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Tomoki Toda,et al.  Duration-Controlled LSTM for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Archontis Politis,et al.  Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[24]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[25]  Erik Marchi,et al.  A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.