Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points

In this article, the direction of arrival (DOA) estimation of multiple speech sources in reverberant environments is investigated based on the recording of a soundfield microphone. First, the recordings are analyzed in the time-frequency (T-F) domain to detect both “points” (single T-F points) and “regions” (multiple, adjacent T-F points) corresponding to a single source with low reverberation (known as low-reverberant-single-source (LRSS) points). Then, a LRSS point detection algorithm is proposed based on a joint dominance measure and instantaneous single-source point (SSP) identification. Following this, initial DOA estimates obtained for the detected LRSS points are analyzed using a Gaussian Mixture Model (GMM) derived by the Expectation-Maximization (EM) algorithm to cluster components into sources or outliers using a rule-based method. Finally, the DOA of each actual source is obtained from the estimated source components. Experiments on both simulated data and data recorded in an actual acoustic chamber demonstrate that the proposed algorithm exhibits improved performance for the DOA estimation in reverberant environments when compared to several existing approaches.

[1]  Jukka Ahonen,et al.  Parametric Spatial Audio Processing of Spaced Microphone Array Recordings for Multichannel Reproduction , 2015 .

[2]  Douglas L. Jones,et al.  Localization of multiple acoustic sources with small arrays using a coherence test. , 2008, The Journal of the Acoustical Society of America.

[3]  Chaozhu Zhang,et al.  Underdetermined Blind Source Separation of Synchronous Orthogonal Frequency Hopping Signals Based on Single Source Points Detection , 2017, Italian National Conference on Sensors.

[4]  Boaz Rafaely,et al.  Speaker localization using the direct-path dominance test for arbitrary arrays , 2018, 2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE).

[5]  Alper Bozkurt,et al.  Sound Localization Sensors for Search and Rescue Biobots , 2016, IEEE Sensors Journal.

[6]  Lorin Netsch,et al.  Sound source localization for video surveillance camera , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[7]  Yannick Deville,et al.  A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources , 2005, Signal Process..

[8]  Fabian J. Theis,et al.  Sparse component analysis and blind source separation of underdetermined mixtures , 2005, IEEE Transactions on Neural Networks.

[9]  Yannick Deville,et al.  Temporal and time-frequency correlation-based blind source separation methods. Part I: Determined and underdetermined linear instantaneous mixtures , 2007, Signal Process..

[10]  Ville Pulkki,et al.  Spatial Sound Reproduction with Directional Audio Coding , 2007 .

[11]  Athanasios Mouchtaris,et al.  Real-Time Multiple Sound Source Localization and Counting Using a Circular Microphone Array , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Jiangtao Xi,et al.  Collaborative Blind Source Separation Using Location Informed Spatial Microphones , 2013, IEEE Signal Processing Letters.

[13]  V. G. Reju,et al.  An algorithm for mixing matrix estimation in instantaneous blind source separation , 2009, Signal Process..

[14]  Christian Jutten,et al.  Estimating the mixing matrix in Sparse Component Analysis (SCA) based on partial k-dimensional subspace clustering , 2008, Neurocomputing.

[15]  Boaz Rafaely,et al.  Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Jiangtao Xi,et al.  Multisource DOA estimation based on time-frequency sparsity and joint inter-sensor data ratio with single acoustic vector sensor , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Y. Deville,et al.  Time–frequency ratio-based blind separation methods for attenuated and time-delayed sources , 2005 .

[19]  V. G. Reju,et al.  Multi-source direction-of-arrival estimation in a reverberant environment using single acoustic vector sensor , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Christof Faller,et al.  Design and Limitations of Non-Coincidence Correction Filters for Soundfield Microphones , 2009 .

[21]  Jiangtao Xi,et al.  Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .

[23]  Yonghong Yan,et al.  Window-Dominant Signal Subspace Methods for Multiple Short-Term Speech Source Localization , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Christian Ritz,et al.  Encoding Multiple Audio Objects Using Intra-Object Sparsity , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Yonghong Yan,et al.  A reverberation robust target speech detection method using dual-microphone in distant-talking scene , 2015, Speech Commun..

[26]  P. Smaragdis,et al.  Learning source trajectories using wrapped-phase hidden Markov models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[27]  Rémi Gribonval,et al.  A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture , 2010, IEEE Transactions on Signal Processing.

[28]  Atiyeh Alinaghi,et al.  Reverberant speech separation with probabilistic time-frequency masking for B-format recordings , 2015, Speech Commun..

[29]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[30]  Y. Deville,et al.  Time-frequency blind signal separation: extended methods, performance evaluation for speech sources , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[31]  Yannick Deville,et al.  Extension of the ” TIme-Frequency Ratio Of Mixtures ” blind source separation method to more than 2 channels , 2003 .

[32]  Boaz Rafaely,et al.  Direction of Arrival Estimation for Reverberant Speech Based on Enhanced Decomposition of the Direct Sound , 2019, IEEE Journal of Selected Topics in Signal Processing.

[33]  Changchun Bao,et al.  Real-time multiple sound source localization and counting using a soundfield microphone , 2017, J. Ambient Intell. Humaniz. Comput..

[34]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[35]  Yu-Chiang Frank Wang,et al.  Propagated image filtering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Hiroshi Sawada,et al.  Blind sparse source separation for unknown number of sources using Gaussian mixture model fitting with Dirichlet prior , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Nikos D. Sidiropoulos,et al.  Blind Separation of Quasi-Stationary Sources: Exploiting Convex Geometry in Covariance Domain , 2015, IEEE Transactions on Signal Processing.

[38]  Christian Ritz,et al.  Multiple-to-single sound source localization by applying single-source bins detection , 2018, Applied Acoustics.

[39]  Hiroshi Sawada,et al.  Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Yang Yu,et al.  Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks , 2016, EURASIP J. Audio Speech Music. Process..