Hearing in a shoe-box: Binaural source position and wall absorption estimation using virtually supervised learning

This paper introduces a new framework for supervised sound source localization referred to as virtually-supervised learning. An acoustic shoe-box room simulator is used to generate a large number of binaural single-source audio scenes. These scenes are used to build a dataset of spatial binaural features annotated with acoustic properties such as the 3D source position and the walls' absorption coefficients. A probabilistic high- to low-dimensional regression framework is used to learn a mapping from these features to the acoustic properties. Results indicate that this mapping successfully estimates the azimuth and elevation of new sources, but also their range and even the walls' absorption coefficients solely based on binaural signals. Results also reveal that incorporating random-diffusion effects in the data significantly improves the estimation of all parameters.

[1]  B. B. Bauer,et al.  Fundamentals of acoustics , 1963 .

[2]  Emanuel A. P. Habets,et al.  Inference of Room Geometry From Acoustic Impulse Responses , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Martin Vetterli,et al.  Acoustic echoes reveal room shape , 2013, Proceedings of the National Academy of Sciences.

[4]  Rémi Gribonval,et al.  Physics-Driven Inverse Problems Made Tractable With Cosparse Regularization , 2016, IEEE Transactions on Signal Processing.

[5]  Sharon Gannot,et al.  Semi-Supervised Sound Source Localization Based on Manifold Regularization , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  W. Hartmann,et al.  Localization of sound in rooms, II: The effects of a single reflecting surface. , 1985, The Journal of the Acoustical Society of America.

[7]  André van Schaik,et al.  Room acoustics simulation for multichannel microphone arrays , 2010 .

[8]  Heinrich Kuttruff,et al.  Room acoustics , 1973 .

[9]  Antoine Deleforge,et al.  VAST: The Virtual Acoustic Space Traveler Dataset , 2016, LVA/ICA.

[10]  Radu Horaud,et al.  High-dimensional regression with gaussian mixtures and partially-latent response variables , 2013, Statistics and Computing.

[11]  W. Hartmann Localization of sound in rooms. , 1983, The Journal of the Acoustical Society of America.

[12]  Bill Gardner,et al.  HRTF Measurements of a KEMAR Dummy-Head Microphone , 1994 .

[13]  Radu Horaud,et al.  Variational EM for binaural sound-source separation and localization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Radu Horaud,et al.  Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Norbert Dillier,et al.  A fast and accurate “shoebox” room acoustics simulator , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[17]  Rémi Gribonval,et al.  Joint estimation of sound source location and boundary impedance with physics-driven cosparse regularization , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  References , 1971 .

[19]  Martin Cooke,et al.  Binaural Estimation of Sound Source Distance via the Direct-to-Reverberant Energy Ratio for Static and Moving Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.