SPIDERnet: Attention Network For One-Shot Anomaly Detection In Sounds

We propose a similarity function for one-shot anomaly detection in sounds (ADS) called SPecific anomaly IDentifiER network (SPI- DERnet). In ADS systems, since overlooking an anomaly may re- sult in serious incidents, we need to update such systems using an (often only one) overlooked anomalous sample. A previous study proposed the use of memory-based one-shot learning. A problem with this previous method is that it can detect only short anomalous sounds such as collision sounds because its similarity function is based on a naive mean-squared-error between the input and memo- rized spectrogram. To detect various anomalous sounds, SPIDERnet consists of (i) a neural network-based feature extractor for measur- ing similarity in embedded space and (ii) attention mechanisms for absorbing time-frequency stretching. Experimental results on two public datasets indicate that SPIDERnet outperforms conventional methods and robustly detects various anomalous sounds.

[1]  Spyridon Matsoukas,et al.  Semi-supervised Acoustic Event Detection Based on Tri-training , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Yuma Koizumi,et al.  Unsupervised Detection of Anomalous Sound Based on Deep Learning and the Neyman–Pearson Lemma , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[4]  Noboru Harada,et al.  AdaFlow: Domain-adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-domain Translation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Nikos Fakotakis,et al.  Probabilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions , 2011, IEEE Transactions on Multimedia.

[6]  Erik Marchi,et al.  A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Hjalmar S. Kühl,et al.  Assessing the performance of a semi‐automated acoustic monitoring system for primates , 2015 .

[8]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[9]  Kyogu Lee,et al.  Rare Sound Event Detection Using 1D Convolutional Recurrent Neural Networks , 2017, DCASE.

[10]  Yohei Kawaguchi,et al.  How can we detect anomalies from subsampled audio signals? , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[11]  Noboru Harada,et al.  Batch Uniformization for Minimizing Maximum Anomaly Score of Dnn-Based Anomaly Detection in Sounds , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[12]  Mark D. Plumbley,et al.  Weakly Labelled AudioSet Tagging With Attention Neural Networks , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Xavier Serra,et al.  Training Neural Audio Classifiers with Few Data , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Nicolai Petkov,et al.  Audio Surveillance of Roads: A System for Detecting Anomalous Sounds , 2016, IEEE Transactions on Intelligent Transportation Systems.

[16]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[17]  Kou Tanaka,et al.  ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Yuma Koizumi,et al.  ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[19]  Takahiro Hara,et al.  Inspection of Visible and Invisible Features of Objects with Image and Sound Signal Processing , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[21]  Noboru Harada,et al.  SNIPER: Few-shot Learning for Anomaly Detection to Minimize False-negative Rate with Ensured True-positive Rate , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Raghavendra Chalapathy University of Sydney,et al.  Deep Learning for Anomaly Detection: A Survey , 2019, ArXiv.

[23]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[24]  Yohei Kawaguchi,et al.  MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection , 2019, DCASE.

[25]  Noboru Harada,et al.  Complementary Set Variational Autoencoder for Supervised Anomaly Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[27]  Hideyuki Tachibana,et al.  Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Xiaofei Wang,et al.  A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[29]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .