Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions

This paper presents a novel machine-hearing system that ex- ploits deep neural networks (DNNs) and head movements for binaural localisation of multiple speakers in reverberant conditions. DNNs are used to map binaural features, consisting of the complete cross-correlation function (CCF) and interaural level differences (ILDs), to the source azimuth. Our approach was evaluated using a localisation task in which sources were located in a full 360-degree azimuth range. As a result, front- back confusions often occurred due to the similarity of binaural features in the front and rear hemifields. To address this, a head movement strategy was incorporated in the DNN-based model to help reduce the front-back errors. Our experiments show that, compared to a system based on a Gaussian mixture model (GMM) classifier, the proposed DNN system substantially re- duces localisation errors under challenging acoustic scenarios in which multiple speakers and room reverberation are present.

[1]  F. Asano,et al.  Role of spectral cues in median plane localization. , 1990, The Journal of the Acoustical Society of America.

[2]  M. Cynader,et al.  A computational theory of spectral cue localization , 1993 .

[3]  Sascha Spors,et al.  A Free Database of Head Related Impulse Response Measurements in the Horizontal Plane with Multiple Distances , 2011 .

[4]  Russell L. Martin,et al.  Sound localization with head movement: implications for 3-d audio displays , 2014, Front. Neurosci..

[5]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[6]  H. Wallach,et al.  The role of head movements and vestibular and visual cues in sound localization. , 1940 .

[7]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[8]  Boaz Rafaely,et al.  Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  S. Perrett,et al.  The effect of head rotations on vertical plane sound localization. , 1997, The Journal of the Acoustical Society of America.

[10]  Yi Jiang,et al.  Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[12]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[13]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[14]  Samuel W. Clapp,et al.  A Binaural Model that Analyses Acoustic Spaces and Stereophonic Reproduction Systems by Utilizing Head Rotations , 2013 .

[15]  Tim Brookes,et al.  Dynamic Precedence Effect Modeling for Source Separation in Reverberant Environments , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Guy J. Brown,et al.  Robust localisation of multiple speakers exploiting head movements and multi-conditional training of binaural cues , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  F L Wightman,et al.  Resolution of front-back ambiguity in spatial hearing by listener and source movement. , 1999, The Journal of the Acoustical Society of America.

[18]  Guy J. Brown,et al.  A machine-hearing system exploiting head movements for binaural sound localisation in reverberant conditions , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  DeLiang Wang,et al.  Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[21]  Volker Willert,et al.  A Probabilistic Model for Binaural Sound Localization , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  AG Armin Kohlrausch,et al.  Binaural Localization and Detection of Speakers in Complex Acoustic Scenes , 2013 .