Window-Dominant Signal Subspace Methods for Multiple Short-Term Speech Source Localization

Signal subspace has been widely exploited to localize multiple speech sources. However, most signal subspace methods cannot count the number of sources, and do not make use of speech sparsity in the frequency domain. This paper presents a grid search window-dominant signal subspace (GS-WDSS) method and a closed-form WDSS (CF-WDSS) method to localize short-term speech sources. Such methods are based upon the generalized sparsity assumption that each window containing some time-adjacent bins is dominated by one source, as opposed to the conventional assumption that each individual bin is dominated by one source. The generalized assumption enables the principal eigenvector of the spatial correlation matrix on each window to span the signal subspace of the window-dominant source. The direction-of-arrival (DOA) of the dominant source is estimated from the principal eigenvector. The DOAs and the number of sources are eventually summarized from the DOA histogram of all dominant sources. The conventional assumption is a special case of the generalized assumption. By using the generalized assumption, the performance in estimating DOAs of the window-dominant sources is significantly improved at the cost of acceptable masking effect. The superiority of the proposed methods is verified by simulated and real experiments.

[1]  Michael D. Zoltowski,et al.  Eigenstructure techniques for 2-D angle estimation with uniform circular arrays , 1994, IEEE Trans. Signal Process..

[2]  Masakiyo Fujimoto,et al.  Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Visa Koivunen,et al.  DoA and Polarization Estimation for Arbitrary Array Configurations , 2012, IEEE Transactions on Signal Processing.

[4]  M. Ericson,et al.  Informational and energetic masking effects in the perception of multiple simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[5]  Bhaskar D. Rao,et al.  A Two Microphone-Based Approach for Source Localization of Multiple Speech Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  M. Morf,et al.  The signal subspace approach for multiple wide-band emitter location , 1983 .

[7]  Sven Nordholm,et al.  Robust Source Localization in Reverberant Environments Based on Weighted Fuzzy Clustering , 2009, IEEE Signal Processing Letters.

[8]  Jacob Benesty,et al.  Time Delay Estimation in Room Acoustic Environments: An Overview , 2006, EURASIP J. Adv. Signal Process..

[9]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Marc Moonen,et al.  Joint DOA and multi-pitch estimation based on subspace techniques , 2012, EURASIP J. Adv. Signal Process..

[11]  Yonghong Yan,et al.  Robust and Fast Localization of Single Speech Source Using a Planar Array , 2013, IEEE Signal Processing Letters.

[12]  Wei Cui,et al.  Low-Complexity Direction-of-Arrival Estimation Based on Wideband Co-Prime Arrays , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Raffaele Parisi,et al.  WAVES: weighted average of signal subspaces for robust wideband direction finding , 2001, IEEE Trans. Signal Process..

[14]  Jacob Benesty,et al.  Passive acoustic source localization for video camera steering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15]  Wei Xue,et al.  Noise Robust Direction of Arrival Estimation for Speech Source With Weighted Bispectrum Spatial Correlation Matrix , 2015, IEEE Journal of Selected Topics in Signal Processing.

[16]  Ta-Sung Lee,et al.  Efficient wideband source localization using beamforming invariance technique , 1994, IEEE Trans. Signal Process..

[17]  Jacob Benesty,et al.  A Generalized Steered Response Power Method for Computationally Viable Source Localization , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[19]  Bhaskar D. Rao,et al.  An ICA-SCT-PHD Filter Approach for Tracking and Separation of Unknown Time-Varying Number of Sources , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[21]  Jeffrey L. Krolik,et al.  Multiple broad-band source location using steered covariance matrices , 1989, IEEE Trans. Acoust. Speech Signal Process..

[22]  Boaz Rafaely,et al.  Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Sergios Theodoridis,et al.  A Novel Efficient Cluster-Based MLSE Equalizer for Satellite Communication Channels with-QAM Signaling , 2006, EURASIP J. Adv. Signal Process..

[24]  Michael S. Brandstein,et al.  A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[25]  Visa Koivunen,et al.  DoA Estimation Via Manifold Separation for Arbitrary Array Structures , 2007, IEEE Transactions on Signal Processing.

[26]  Jeffrey L. Krolik,et al.  Source location bias in the coherently focused high-resolution broad-band beamformer , 1989, IEEE Trans. Acoust. Speech Signal Process..

[27]  Yuichi Nakamura,et al.  Smart meeting systems: A survey of state-of-the-art and open issues , 2010, CSUR.

[28]  Taewoo Lee,et al.  Fast Sound Source Localization Using Two-Level Search Space Clustering , 2016, IEEE Transactions on Cybernetics.

[29]  Hiroshi Sawada,et al.  Doa Estimation for Multiple Sparse Sources with Normalized Observation Vector Clustering , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[30]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[31]  Kazuhiro Nakadai,et al.  Sound Source Localization Using Joint Bayesian Estimation With a Hierarchical Noise Model , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Jacob Benesty,et al.  Direction of Arrival Estimation Using the Parameterized Spatial Correlation Matrix , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Jacob Benesty,et al.  Broadband Source Localization From an Eigenanalysis Perspective , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Athanasios Mouchtaris,et al.  Real-Time Multiple Sound Source Localization and Counting Using a Circular Microphone Array , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Douglas L. Jones,et al.  Localization of multiple acoustic sources with small arrays using a coherence test. , 2008, The Journal of the Acoustical Society of America.

[36]  Daniel P. W. Ellis,et al.  An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments , 2006, NIPS.

[37]  Maximo Cobos,et al.  Robust acoustic source localization based on modal beamforming and time-frequency processing using circular microphone arrays. , 2012, The Journal of the Acoustical Society of America.

[38]  B C Wheeler,et al.  Localization of multiple sound sources with two microphones. , 2000, The Journal of the Acoustical Society of America.

[39]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[40]  Thushara D. Abhayapala,et al.  Broadband DOA Estimation Using Sensor Arrays on Complex-Shaped Rigid Bodies , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Ramani Duraiswami,et al.  Accelerated speech source localization via a hierarchical search of steered response power , 2004, IEEE Transactions on Speech and Audio Processing.

[42]  Guisheng Liao,et al.  Joint Pitch and DOA Estimation Using the ESPRIT Method , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[43]  Carlos Busso,et al.  Smart room: participant and speaker localization and identification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[44]  Wen-Jun Zeng,et al.  High-Resolution Multiple Wideband and Nonstationary Source Localization With Unknown Number of Sources , 2010, IEEE Transactions on Signal Processing.

[45]  Yonghong Yan,et al.  Robust multiple speech source localization using time delay histogram , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  Amir Said,et al.  A Steered-Response Power Algorithm Employing Hierarchical Search for Acoustic Source Localization Using Microphone Arrays , 2014, IEEE Transactions on Signal Processing.

[47]  Yuexian Zou,et al.  A Novel Multiple Sparse Source Localization Using Triangular Pyramid Microphone Array , 2012, IEEE Signal Processing Letters.

[48]  Rémi Gribonval,et al.  A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture , 2010, IEEE Transactions on Signal Processing.

[49]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[50]  James R. Hopgood,et al.  A Time–Frequency Masking Based Random Finite Set Particle Filtering Method for Multiple Acoustic Source Detection and Tracking , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[51]  Tuomas Virtanen,et al.  Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation , 2022 .

[52]  Hong Wang,et al.  Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources , 1985, IEEE Trans. Acoust. Speech Signal Process..