Deep network source localization and the influence of sensor geometry

Learning-based localization approaches cast the acoustic speaker localization problem as a machine learning task where a classifier is trained on example data of acoustic feature vectors in order to predict likelihood of speech presence as a spatio-temporal distribution. We investigate the impact that fundamental acoustic parameters of the auditory scene (e.g. SNR, acoustic scene complexity, sensor geometry) exert on the ability to faithfully extract spatio-temporal activity maps for concurrent speakers. Our results indicate that to some degree shortcomings in the acoustic conditions can be compensated by increased complexity in the applied classification techniques. To this end, we systematically investigate localization performance for a set of deep neural network localizers of varying complexity, and for six different sensor configurations in a bilateral hearing aid setup. Deep networks result in improved performance compared to linear localizers, and their performance benefits more from an increase in the number of sensor channels. In specific configurations, deep networks with a smaller number of microphones perform better than a linear baseline network with a larger number of microphones. Thus, location-specific information in source-interference scenarios appears to be encoded non-linearly in the soundfield, requiring non-linear approaches for optimal decoding.

[1]  Antoine Deleforge,et al.  Hearing in a shoe-box: Binaural source position and wall absorption estimation using virtually supervised learning , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Niko Moritz,et al.  Probabilistic Spatial Filter Estimation for Signal Enhancement in Multi-Channel Automatic Speech Recognition , 2016, INTERSPEECH.

[3]  Jörn Anemüller,et al.  A discriminative learning approach to probabilistic acoustic source localization , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[4]  Emanuel A. P. Habets,et al.  Broadband doa estimation using convolutional neural networks trained with noise signals , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[5]  Kazunori Komatani,et al.  Sound source localization based on deep neural networks with directional activate function exploiting phase information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Maurizio Omologo,et al.  Acoustic event localization using a crosspower-spectrum phase based technique , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Jörn Anemüller,et al.  Multi-channel signal enhancement with speech and noise covariance estimates computed by a probabilistic localization model , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Volker Hohmann,et al.  Database of Multichannel In-Ear and Behind-the-Ear Head-Related and Binaural Room Impulse Responses , 2009, EURASIP J. Adv. Signal Process..