Robust localisation of multiple speakers exploiting head movements and multi-conditional training of binaural cues

This paper addresses the problem of localising multiple competing speakers in the presence of room reverberation, where sound sources can be positioned at any azimuth on the horizontal plane. To reduce the amount of front-back confusions which can occur due to the similarity of interaural time differences (ITDs) and interaural level differences (ILDs) in the front and rear hemifield, a machine hearing system is presented which combines supervised learning of binaural cues using multi-conditional training (MCT) with a head movement strategy. A systematic evaluation showed that this approach substantially reduced the amount of front-back confusions in challenging acoustic scenarios. Moreover, the system was able to generalise to a variety of different acoustic conditions not seen during training.

[1]  Ivan Markovic,et al.  Active speaker localization with circular likelihoods and bootstrap filtering , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[3]  Ning Ma,et al.  A speech fragment approach to localising multiple speakers in reverberant environments , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[5]  F L Wightman,et al.  Resolution of front-back ambiguity in spatial hearing by listener and source movement. , 1999, The Journal of the Acoustical Society of America.

[6]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Gaetano Scarano,et al.  Discrete time techniques for time delay estimation , 1993, IEEE Trans. Signal Process..

[8]  Russell L. Martin,et al.  Sound localization with head movement: implications for 3-d audio displays , 2014, Front. Neurosci..

[9]  Sascha Spors,et al.  A Free Database of Head Related Impulse Response Measurements in the Horizontal Plane with Multiple Distances , 2011 .

[10]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[11]  H S Colburn,et al.  Speech intelligibility and localization in a multi-source environment. , 1999, The Journal of the Acoustical Society of America.

[12]  AG Armin Kohlrausch,et al.  Binaural Localization and Detection of Speakers in Complex Acoustic Scenes , 2013 .

[13]  Tim Brookes,et al.  Dynamic Precedence Effect Modeling for Source Separation in Reverberant Environments , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  H. Wallach,et al.  The role of head movements and vestibular and visual cues in sound localization. , 1940 .

[15]  DeLiang Wang,et al.  Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Steven van de Par,et al.  Blind Estimation of the Number of Speech Sources in Reverberant Multisource Scenarios Based on Binaural Signals , 2012, IWAENC.

[17]  Torsten Dau,et al.  Environment-aware ideal binary mask estimation using monaural cues , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[18]  Guy J. Brown,et al.  Binaural sound source localisation using a Bayesian-network-based blackboard system and hypothesis-driven feedback , 2014 .

[19]  Guy J. Brown,et al.  A machine-hearing system exploiting head movements for binaural sound localisation in reverberant conditions , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).