Joint model-based recognition and localization of overlapped acoustic events using a set of distributed small microphone arrays

In the analysis of acoustic scenes, often the occurring sounds have to be detected in time, recognized, and localized in space. Usually, each of these tasks is done separately. In this paper, a model-based approach to jointly carry them out for the case of multiple simultaneous sources is presented and tested. The recognized event classes and their respective room positions are obtained with a single system that maximizes the combination of a large set of scores, each one resulting from a different acoustic event model and a different beamformer output signal, which comes from one of several arbitrarily-located small microphone arrays. By using a two-step method, the experimental work for a specific scenario consisting of meeting-room acoustic events, either isolated or overlapped with speech, is reported. Tests carried out with two datasets show the advantage of the proposed approach with respect to some usual techniques, and that the inclusion of estimated priors brings a further performance improvement.

[1]  Ying Yu,et al.  A Real-Time SRP-PHAT Source Location Implementation using Stochastic Region Contraction(SRC) on a Large-Aperture Microphone Array , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Zhengyou Zhang,et al.  Why does PHAT work well in lownoise, reverberative environments? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Taras Butko,et al.  Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities , 2011, EURASIP J. Adv. Signal Process..

[4]  Jean-Christophe Pesquet,et al.  Quadratic Higher Order Criteria for Iterative Blind Separation of a MIMO Convolutive Mixture of Sources , 2007, IEEE Transactions on Signal Processing.

[5]  Jacob Benesty,et al.  Steered Beamforming Approaches for Acoustic Source Localization , 2010 .

[6]  Stephan Gerlach,et al.  Acoustic Monitoring and Localization for Social Care , 2012, J. Comput. Sci. Eng..

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Climent Nadeu,et al.  Time and frequency filtering of filter-bank energies for robust HMM speech recognition , 2000, Speech Commun..

[9]  Climent Nadeu,et al.  Joint recognition and direction-of-arrival estimation of simultaneous meeting-room acoustic events , 2013, INTERSPEECH.

[10]  Harvey F. Silverman,et al.  Microphone array optimization by stochastic region contraction , 1991, IEEE Trans. Signal Process..

[11]  Maurizio Omologo,et al.  Use of the crosspower-spectrum phase in acoustic event location , 1997, IEEE Trans. Speech Audio Process..

[12]  Taras Butko,et al.  Source ambiguity resolution of overlapped sounds in a multi-microphone room environment , 2014, EURASIP Journal on Audio, Speech, and Music Processing.

[13]  Petros Maragos,et al.  The Athena-RC system for speech activity detection and speaker localization in the DIRHA smart home , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[14]  Climent Nadeu,et al.  Real-time multi-microphone recognition of simultaneous sounds in a room environment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Walter Kellermann,et al.  Simultaneous localization of multiple sound sources using blind adaptive MIMO filtering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Marc Castella,et al.  A new optimization method for reference-based quadratic contrast functions in a deflation scenario , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Climent Nadeu,et al.  Sound-model-based acoustic source localization using distributed microphone arrays , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[19]  Zhengyou Zhang,et al.  Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings , 2008, IEEE Transactions on Multimedia.

[20]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[21]  Dan Stowell,et al.  Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[22]  Taras Butko,et al.  Two-source acoustic event detection and localization: Online implementation in a Smart-room , 2011, 2011 19th European Signal Processing Conference.

[23]  Philippe Loubaton,et al.  Separation of a class of convolutive mixtures: a contrast function approach , 2001, Signal Process..

[24]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[25]  Alessio Brutti,et al.  A speech event detection and localization task for multiroom environments , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[26]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[27]  Lucas C. Parra,et al.  Steerable frequency-invariant beamforming for arbitrary arrays , 2006 .