Multi-Source DOA Estimation Through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield

We propose a novel multi-source direction of arrival (DOA) estimation technique using a convolutional neural network algorithm which learns the modal coherence patterns of an incident soundfield through measured spherical harmonic coefficients. We train our model for individual time-frequency bins in the short-time Fourier transform spectrum by analyzing the unique snapshot of modal coherence for each desired direction. The proposed method is capable of estimating simultaneously active multiple sound sources on a 3D space using a single-source training scheme. This single-source training scheme reduces the training time and resource requirements as well as allows the reuse of the same trained model for different multi-source combinations. The method is evaluated against various simulated and practical noisy and reverberant environments with varying acoustic criteria and found to outperform the baseline methods in terms of DOA estimation accuracy. Furthermore, the proposed algorithm allows independent training of azimuth and elevation during a full DOA estimation over 3D space which significantly improves its training efficiency without affecting the overall estimation accuracy.

[1]  Boaz Rafaely,et al.  Coherent signals direction-of-arrival estimation using a spherical microphone array: Frequency smoothing approach , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[2]  Kazunori Komatani,et al.  Sound source localization based on deep neural networks with directional activate function exploiting phase information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Emmanuel Vincent,et al.  Multichannel Audio Source Separation With Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  R. Kress,et al.  Inverse Acoustic and Electromagnetic Scattering Theory , 1992 .

[5]  M. Viberg,et al.  Two decades of array signal processing research: the parametric approach , 1996, IEEE Signal Process. Mag..

[6]  Emanuel A. P. Habets,et al.  Multi-speaker DOA estimation in reverberation conditions using expectation-maximization , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).

[7]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[8]  Bhaskar D. Rao,et al.  A Two Microphone-Based Approach for Source Localization of Multiple Speech Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Alastair H. Moore,et al.  Direction of Arrival Estimation in the Spherical Harmonic Domain Using Subspace Pseudointensity Vectors , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Gary W. Elko,et al.  A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Tetsuya Ogata,et al.  Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[12]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[13]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[14]  Kung Yao,et al.  Maximum-likelihood source localization and unknown sensor location estimation for wideband signals in the near-field , 2002, IEEE Trans. Signal Process..

[15]  Vittorio Murino,et al.  Audio Surveillance , 2014, ACM Comput. Surv..

[16]  Ying Yu,et al.  A Real-Time SRP-PHAT Source Location Implementation using Stochastic Region Contraction(SRC) on a Large-Aperture Microphone Array , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Soumitro Chakrabarty,et al.  Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals , 2018, IEEE Journal of Selected Topics in Signal Processing.

[18]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[19]  Boaz Rafaely,et al.  Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Luiz W. P. Biscainho,et al.  A Volumetric SRP with Refinement Step for Sound Source Localization , 2015, IEEE Signal Processing Letters.

[21]  Prasanga N. Samarasinghe,et al.  Sound Source Localization in a Reverberant Room Using Harmonic Based Music , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Amir Said,et al.  A Steered-Response Power Algorithm Employing Hierarchical Search for Acoustic Source Localization Using Microphone Arrays , 2014, IEEE Transactions on Signal Processing.

[23]  Thushara D. Abhayapala,et al.  Reproduction of a plane-wave sound field using an array of loudspeakers , 2001, IEEE Trans. Speech Audio Process..

[24]  Thushara D. Abhayapala,et al.  Spherical Harmonic Analysis of Wavefields Using Multiple Circular Sensor Arrays , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Archontis Politis,et al.  Direction of Arrival Estimation of Reflections from Room Impulse Responses Using a Spherical Microphone Array , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  P. Stoica,et al.  Maximum Likelhood Methods for Direction-of- Arrival Estimation , 1990 .

[27]  Hong Wang,et al.  Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources , 1985, IEEE Trans. Acoust. Speech Signal Process..

[28]  Alastair H. Moore,et al.  The ACE challenge — Corpus description and performance evaluation , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[29]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[30]  Carlos Busso,et al.  Smart room: participant and speaker localization and identification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[31]  Francesco Piazza,et al.  A neural network based algorithm for speaker localization in a multi-room environment , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[32]  Laurent Girin,et al.  Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[33]  Archontis Politis,et al.  Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network , 2017, 2018 26th European Signal Processing Conference (EUSIPCO).

[34]  Stefan B. Williams,et al.  Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Ramani Duraiswami,et al.  Flexible and Optimal Design of Spherical Microphone Arrays for Beamforming , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Sina Hafezi,et al.  3D acoustic source localization in the spherical harmonic domain based on optimized grid search , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Andreas M. Ali,et al.  Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations and prospectus , 2011 .

[38]  Jörg Fliege,et al.  A Two-Stage Approach for Computing Cubature Formulae for the Sphere , 1996 .

[39]  Holography Book,et al.  Fourier Acoustics Sound Radiation And Nearfield Acoustical Holography , 2016 .

[40]  Susanto Rahardja,et al.  Indoor Sound Source Localization With Probabilistic Neural Network , 2017, IEEE Transactions on Industrial Electronics.

[41]  Prasanga N. Samarasinghe,et al.  Performance analysis of a planar microphone array for three dimensional soundfield analysis , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[42]  Haizhou Li,et al.  A learning-based approach to direction of arrival estimation in noisy and reverberant environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Emanuel A. P. Habets,et al.  DOA estimation in noisy environment with unknown noise power using the EM algorithm , 2017, 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA).

[44]  Archontis Politis,et al.  Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[45]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[46]  Rajesh M. Hegde,et al.  Near-Field Acoustic Source Localization and Beamforming in Spherical Harmonics Domain , 2016, IEEE Transactions on Signal Processing.

[47]  Wen Zhang,et al.  Theory and design of compact hybrid microphone arrays on two-dimensional planes for three-dimensional soundfield analysis. , 2015, The Journal of the Acoustical Society of America.

[48]  Thomas Hofmann,et al.  An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments , 2007 .

[49]  Sina Hafezi,et al.  Robust Source Counting and Acoustic DOA Estimation using Density-Based Clustering , 2018, 2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM).

[50]  Prasanga N. Samarasinghe,et al.  PSD Estimation and Source Separation in a Noisy Reverberant Environment Using a Spherical Microphone Array , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[51]  Rodney A. Kennedy,et al.  On dimensionality of multipath fields: Spatial extent and richness , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  Hiroaki Kitano,et al.  Active Audition for Humanoid , 2000, AAAI/IAAI.

[53]  Maximo Cobos,et al.  A steered response power iterative method for high-accuracy acoustic source localization. , 2013, The Journal of the Acoustical Society of America.

[54]  Jhing-Fa Wang,et al.  Smart Homecare Surveillance System: Behavior Identification Based on State-Transition Support Vector Machines and Sound Directivity Pattern Analysis , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[55]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[56]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[57]  E. Habets,et al.  On the angular error of intensity vector based direction of arrival estimation in reverberant sound fields. , 2010, The Journal of the Acoustical Society of America.

[58]  Sina Hafezi,et al.  Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[59]  Alastair H. Moore,et al.  Direction of arrival estimation using pseudo-intensity vectors with direct-path dominance test , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[60]  Leonard C. Maximon 3j, 6j, 9j Symbols , 2010, NIST Handbook of Mathematical Functions.

[61]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[62]  Thushara D. Abhayapala,et al.  Theory and design of high order sound field microphones using spherical microphone array , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[63]  Emmanuel Vincent,et al.  A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[64]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[65]  Dorothea Kolossa,et al.  Speaker localization in reverberant rooms based on direct path dominance test statistics , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[66]  Yusuke Hioka,et al.  Underdetermined Sound Source Separation Using Power Spectrum Density Estimated by Combination of Directivity Gain , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[67]  Hao Ye,et al.  Maximum likelihood DOA estimation and asymptotic Cramer-Rao bounds for additive unknown colored noise , 1995, IEEE Trans. Signal Process..

[68]  Sharon Gannot,et al.  Semi-Supervised Source Localization on Multiple Manifolds With Distributed Microphones , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[69]  B C Wheeler,et al.  Localization of multiple sound sources with two microphones. , 2000, The Journal of the Acoustical Society of America.