Binaural Sound Source Localisation in Complex Conditions

There has been a growing interest in the reproduction of human spatial hearing behaviours, arising from the development of spatial audio signal processing techniques. To accurately localise single or multiple sound sources using humanoid apparatus, it is essential to be able to exploit the spatial-related features of the human subject filtering effect, which requires an understanding of both the feature characteristics and the mapping relationship to the source locations. In this thesis, we analyse and evaluate the localisation feature characteristics of binaural signal, and explore a method for constructing a localisation mapping model. As a result of the reflecting and diffracting of human-like apparatus, sound waves are filtered before being captured by the eardrum, and the filtering effects result in various behaviours in the frequency domain. This thesis first summarises the characteristics of those behaviours and evaluates their importance to localisation. We analyse and evaluate the correlation between source location and three main interaural cues, which are interaural level differences, interaural time difference and interaural phase difference. Then, we explore the process to exploit those features using, and develop a novel feature vector by combining the most valuable spectra. Following this, by employing mutual information as the evaluation metric for frequencies selection, we propose a new feature location mapping model that embeds the feature evaluation process. The new mapping uses a multiple-tree structured model based on the random forest that shows high tolerance to noise. Through computational simulations and practical experiments, the model presents an improvement in both accuracy and robustness according to the comparison of the angular error and localisation correct rate. Finally, by combining our localisation method with the recent proposed direct path transfer function estimation method based on a convolutive transfer function model, we design a binaural localisation system for an unknown environment. The remainder of this thesis demonstrates the possibility of using the active localisation cues in a binaural system. Based on observations of human active head rotation behaviour, we investigate the effect of dynamic features in binaural localisation. The analysis shows that head rotation enriches the variation of localisation features, which resolve the problem of cone-of-confusion and simplifies vertical-wise

[1]  Volker Hohmann,et al.  Sound source localization in real sound fields based on empirical statistics of interaural parameters. , 2006, The Journal of the Acoustical Society of America.

[2]  Kim S. Abouchacra,et al.  Binaural and Spatial Hearing in Real and Virtual Environments , 1998 .

[3]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[4]  R. Duda,et al.  Range dependence of the response of a spherical head model , 1998 .

[5]  W R Thurlow,et al.  Head movements during sound localization. , 1967, The Journal of the Acoustical Society of America.

[6]  Torsten Dau,et al.  Binaural dereverberation based on interaural coherence histograms. , 2013, The Journal of the Acoustical Society of America.

[7]  Klaus Diepold,et al.  A New Method for Binaural 3-D Localization Based on Hrtfs , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  DeLiang Wang,et al.  Incorporating Auditory Feature Uncertainties in Robust Speaker Identification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  J. Hebrank,et al.  Spectral cues used in the localization of sound sources on the median plane. , 1974, The Journal of the Acoustical Society of America.

[10]  Raffaele Parisi,et al.  Binaural sound source localization in the presence of reverberation , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[11]  DeLiang Wang,et al.  An auditory-based feature for robust speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Israel Cohen,et al.  System Identification in the Short-Time Fourier Transform Domain With Crossband Filtering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  I. Nelken Demonstrations of Auditory Scene Analysis: The Perceptual Organization of Sound by Albert S. Bregman and Pierre A. Ahad, MIT Press, 1996. £15.95 CD , 1997, Trends in Neurosciences.

[14]  Robert Baumgartner,et al.  Assessment of Sagittal-Plane Sound Localization Performance in Spatial-Audio Applications , 2013 .

[15]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[16]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[17]  Jonas Braasch,et al.  A correlation-based binaural localization model with reflection identification using neural networks , 2018 .

[18]  M. Cynader,et al.  A computational theory of spectral cue localization , 1993 .

[19]  Iván V. Meza,et al.  Localization of sound sources in robotics: A review , 2017, Robotics Auton. Syst..

[20]  T. Anderson,et al.  Binaural and spatial hearing in real and virtual environments , 1997 .

[21]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[22]  Zhengyou Zhang,et al.  Why does PHAT work well in lownoise, reverberative environments? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Stephen E. Levinson,et al.  A Bayes-rule based hierarchical system for binaural sound source localization , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[24]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[25]  DeLiang Wang,et al.  Deep Learning Based Binaural Speech Separation in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Hong Wang,et al.  Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources , 1985, IEEE Trans. Acoust. Speech Signal Process..

[27]  D. M. Green,et al.  Sound localization by human listeners. , 1991, Annual review of psychology.

[28]  DeLiang Wang,et al.  A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  DeLiang Wang,et al.  Analyzing noise robustness of MFCC and GFCC features in speaker identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Philip J. B. Jackson,et al.  Robust Full-sphere Binaural Sound Source Localization Using Interaural and Spectral Cues , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Radu Horaud,et al.  Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Horst-Michael Groß,et al.  Binaural sound localization in an artificial neural network , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[33]  Radu Horaud,et al.  Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds , 2014, Int. J. Neural Syst..

[34]  F. Keyrouz,et al.  Real time humanoid sound source localization and tracking in a highly reverberant environment , 2008, 2008 9th International Conference on Signal Processing.

[35]  Jacob Benesty,et al.  Passive acoustic source localization for video camera steering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[36]  Noboru Ohnishi,et al.  Building ears for robots: Sound localization and separation , 1997, Artificial Life and Robotics.

[37]  E. Langendijk,et al.  Contribution of spectral cues to human sound localization. , 1999, The Journal of the Acoustical Society of America.

[38]  Jwu-Sheng Hu,et al.  Location Classification of Nonstationary Sound Sources Using Binaural Room Distribution Patterns , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Ivan Tashev,et al.  Head-related transfer function personalization for the needs of spatial audio in mixed and virtual reality , 2017 .

[40]  J Weng,et al.  Three-dimensional sound localization from a compact non-coplanar array of microphones using tree-based learning. , 2001, The Journal of the Acoustical Society of America.

[41]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[42]  R. Brooks,et al.  The cog project: building a humanoid robot , 1999 .

[43]  Fakheredine Keyrouz,et al.  Advanced Binaural Sound Localization in 3-D for Humanoid Robots , 2014, IEEE Transactions on Instrumentation and Measurement.

[44]  Thomas Esch,et al.  Model-Based Dereverberation Preserving Binaural Cues , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[45]  William A. Yost,et al.  Active binaural localization of multiple sound sources , 2016, Robotics Auton. Syst..

[46]  Francis M. Boland,et al.  A Machine Learning Approach to Detecting Sound-Source Elevation in Adverse Environments , 2018 .

[47]  Hong Liu,et al.  A two-layer probabilistic model based on time-delay compensation for binaural sound localization , 2013, 2013 IEEE International Conference on Robotics and Automation.

[48]  Hong Liu,et al.  Robust Acoustic Localization Via Time-Delay Compensation and Interaural Matching Filter , 2015, IEEE Transactions on Signal Processing.

[49]  Martin Bouchard,et al.  Improved Noise Power Spectrum Density Estimation for Binaural Hearing Aids Operating in a Diffuse Noise Field Environment , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[50]  I. Hirsh The Influence of Interaural Phase on Interaural Summation and Inhibition , 1948 .

[51]  Laurent Kneip,et al.  Binaural model for artificial spatial sound localization based on interaural time delays and movements of the interaural axis. , 2008, The Journal of the Acoustical Society of America.

[52]  R M Cox,et al.  Composite speech spectrum for hearing and gain prescriptions. , 1988, Journal of speech and hearing research.

[53]  Guy J. Brown,et al.  Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[54]  V R Algazi,et al.  Elevation localization and head-related transfer function analysis at low frequencies. , 2001, The Journal of the Acoustical Society of America.

[55]  Michele Scarpiniti,et al.  Cepstrum Prefiltering for Binaural Source Localization in Reverberant Environments , 2012, IEEE Signal Processing Letters.

[56]  Yi Jiang,et al.  Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[57]  DeLiang Wang,et al.  A classification based approach to speech segregation. , 2012, The Journal of the Acoustical Society of America.

[58]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[59]  Jie Huang,et al.  A biomimetic system for localization and separation of multiple sound sources , 1994 .

[60]  Radu Horaud,et al.  Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[61]  Philip S. Yu,et al.  Effective estimation of posterior probabilities: explaining the accuracy of randomized decision tree approaches , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[62]  Guy J. Brown,et al.  A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation , 2004, Speech Commun..

[63]  Ramani Duraiswami,et al.  Extracting the frequencies of the pinna spectral notches in measured head related impulse responses. , 2004, The Journal of the Acoustical Society of America.

[64]  S. Perrett,et al.  The effect of head rotations on vertical plane sound localization. , 1997, The Journal of the Acoustical Society of America.

[65]  Bill Gardner,et al.  HRTF Measurements of a KEMAR Dummy-Head Microphone , 1994 .

[66]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[67]  H. Wallach,et al.  The role of head movements and vestibular and visual cues in sound localization. , 1940 .

[68]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[69]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[70]  A. Moiseff,et al.  An artificial neural network for sound localization using binaural cues. , 1996, The Journal of the Acoustical Society of America.

[71]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[72]  Thushara D. Abhayapala,et al.  Spatial feature learning for robust binaural sound source localization using a composite feature vector , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[73]  Hong Liu,et al.  A new hierarchical binaural sound source localization method based on Interaural Matching Filter , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[74]  Volker Willert,et al.  A Probabilistic Model for Binaural Sound Localization , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[75]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[76]  Brian J. d'Auriol,et al.  A novel feature selection method based on normalized mutual information , 2011, Applied Intelligence.

[77]  Mark R. Anderson,et al.  Direct comparison of the impact of head tracking, reverberation, and individualized head-related transfer functions on the spatial perception of a virtual speech source. , 2001, Journal of the Audio Engineering Society. Audio Engineering Society.

[78]  Durand R. Begault,et al.  3-D Sound for Virtual Reality and Multimedia Cambridge , 1994 .

[79]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[80]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[81]  Tobias May Robust Speech Dereverberation With a Neural Network-Based Post-Filter That Exploits Multi-Conditional Training of Binaural Cues , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[82]  DeLiang Wang,et al.  CASA-Based Robust Speaker Identification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[83]  M. Bodden Modeling human sound-source localization and the cocktail-party-effect , 1993 .

[84]  V. Ralph Algazi,et al.  An adaptable ellipsoidal head model for the interaural time difference , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[85]  H. Takemoto,et al.  Mechanism for generating peaks and notches of head-related transfer functions in the median plane. , 2012, The Journal of the Acoustical Society of America.

[86]  Richard F. Lyon A computational model of binaural localization and separation , 1983, ICASSP.

[87]  Kazuhiro Iida,et al.  Median plane localization using a parametric model of the head-related transfer function based on spectral cues , 2007 .

[88]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[89]  Harald Viste,et al.  Binaural Source Localization by Joint Estimation of ILD and ITD , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[90]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[91]  Jie Huang,et al.  Echo avoidance in a computational model of the precedence effect , 1999, Speech Commun..

[92]  DeLiang Wang,et al.  Exploring Monaural Features for Classification-Based Speech Segregation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[93]  Dorothea Kolossa,et al.  Monte Carlo exploration for active binaural localization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[94]  Daniel P. W. Ellis,et al.  Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[95]  Joseph P. Olive,et al.  Text-to-speech synthesis , 1995, AT&T Technical Journal.

[96]  Keith D. Martin Estimating azimuth and elevation from interaural differences , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[97]  Kazuhiro Iida,et al.  HRTF and Sound Localization in the Median Plane , 2019, Head-Related Transfer Function and Acoustic Virtual Reality.

[98]  Toshihiro Furukawa,et al.  A cepstrum prefiltering approach for DOA estimation of speech signal in reverberant environments , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[99]  Raffaele Parisi,et al.  Prefiltering approaches for time delay estimation in reverberant environments , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[100]  Israel Cohen,et al.  Relative Transfer Function Identification Using Convolutive Transfer Function Approximation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[101]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  Hong Wang,et al.  Voice source localization for automatic camera pointing system in videoconferencing , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[103]  Henning Puder,et al.  Signal Processing in High-End Hearing Aids: State of the Art, Challenges, and Future Trends , 2005, EURASIP J. Adv. Signal Process..

[104]  Hong Liu,et al.  A binaural sound source localization model based on time-delay compensation and interaural coherence , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[105]  Benoît Champagne,et al.  A new cepstral prefiltering technique for estimating time delay under reverberant conditions , 1997, Signal Process..

[106]  Buket D. Barkana,et al.  Energy Estimation between Adjacent Formant Frequencies to Identify Speaker's Gender , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[107]  M. Gardner,et al.  Problem of localization in the median plane: effect of pinnae cavity occlusion. , 1973, The Journal of the Acoustical Society of America.

[108]  Wen Zhang,et al.  Binaural sound source localization using the frequency diversity of the head-related transfer function. , 2014, The Journal of the Acoustical Society of America.

[109]  L. Rayleigh,et al.  XII. On our perception of sound direction , 1907 .

[110]  L A JEFFRESS,et al.  A place theory of sound localization. , 1948, Journal of comparative and physiological psychology.

[111]  José Santos-Victor,et al.  Sound Localization for Humanoid Robots - Building Audio-Motor Maps based on the HRTF , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[112]  Philip J. B. Jackson,et al.  Robust Full-Sphere Binaural Sound Source Localization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[113]  DeLiang Wang,et al.  Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[114]  Bo-sun Xie,et al.  Head-Related Transfer Function and Virtual Auditory Display: E-Book , 2013 .

[115]  Radu Horaud,et al.  2D sound-source localization on the binaural manifold , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[116]  Daniel J. Tollin,et al.  The Precedence Effect in Sound Localization , 2015, Journal of the Association for Research in Otolaryngology.

[117]  F. Keyrouz,et al.  An Enhanced Binaural 3D Sound Localization Algorithm , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[118]  F L Wightman,et al.  Resolution of front-back ambiguity in spatial hearing by listener and source movement. , 1999, The Journal of the Acoustical Society of America.

[119]  Samuel W. Clapp,et al.  A Binaural Model that Analyses Acoustic Spaces and Stereophonic Reproduction Systems by Utilizing Head Rotations , 2013 .

[120]  Helmut Haas,et al.  The Influence of a Single Echo on the Audibility of Speech , 1972 .

[121]  Guy J. Brown,et al.  Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[122]  J. C. R. Licklider,et al.  The Influence of Interaural Phase Relations upon the Masking of Speech by White Noise , 1948 .

[123]  W M Hartmann,et al.  Identification and localization of sound sources in the median sagittal plane. , 1999, The Journal of the Acoustical Society of America.

[124]  Mohan M. Trivedi,et al.  Analysis of time-delay estimation in reverberant environments , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.