Detection of overlapping acoustic events using a temporally-constrained probabilistic model

In this paper, a system for overlapping acoustic event detection is proposed, which models the temporal evolution of sound events. The system is based on probabilistic latent component analysis, supporting the use of a sound event dictionary where each exemplar consists of a succession of spectral templates. The temporal succession of the templates is controlled through event class-wise Hidden Markov Models (HMMs). As input time/frequency representation, the Equivalent Rectangular Bandwidth (ERB) spectrogram is used. Experiments are carried out on polyphonic datasets of office sounds generated using an acoustic scene simulator, as well as real and synthesized monophonic datasets for comparative purposes. Results show that the proposed system outperforms several state-of-the-art methods for overlapping acoustic event detection on the same task, using both frame-based and event-based metrics, and is robust to varying event density and noise levels.

[1]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Birger Kollmeier,et al.  On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’ , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[3]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  Chng Eng Siong,et al.  Overlapping sound event recognition using local spectrogram features and the generalised hough transform , 2013, Pattern Recognit. Lett..

[8]  Dan Stowell,et al.  Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[10]  Brian C. J. Moore,et al.  Chapter 5 – Frequency Analysis and Masking , 1995 .

[11]  Anssi Klapuri,et al.  Latent semantic analysis in sound event detection , 2011, 2011 19th European Signal Processing Conference.

[12]  P. Karsmakers,et al.  AN MFCC-GMM APPROACH FOR EVENT DETECTION AND CLASSIFICATION , 2013 .

[13]  Heikki Huttunen,et al.  Polyphonic sound event detection using multi label deep neural networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[14]  Axel Röbel,et al.  An evaluation framework for event detection using a morphological model of acoustic scenes , 2015, ArXiv.

[15]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Bart Vanrumste,et al.  An exemplar-based NMF approach to audio event detection , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[17]  Mathieu Lagrange,et al.  Characterisation of acoustic scenes using a temporally-constrained shift-invariant model , 2012 .

[18]  Daniel P. W. Ellis,et al.  Spectral vs. spectro-temporal features for acoustic event detection , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[19]  Julius O. Smith,et al.  A non-negative framework for joint modeling of spectral structure and temporal dynamics in sound mixtures , 2010 .

[20]  Tillman Weyde,et al.  An Efficient Temporally-Constrained Probabilistic Model for Multiple-Instrument Music Transcription , 2015, ISMIR.