Determining Mixing Parameters From Multispeaker Data Using Speech-Specific Information

In this paper, we propose an approach for processing multispeaker speech signals collected simultaneously using a pair of spatially separated microphones in a real room environment. Spatial separation of microphones results in a fixed time-delay of arrival of speech signals from a given speaker at the pair of microphones. These time-delays are estimated by exploiting the impulse-like characteristic of excitation during speech production. The differences in the time-delays for different speakers are used to determine the number of speakers from the mixed multispeaker speech signals. There is difference in the signal levels due to differences in the distances between the speaker and each of the microphones. The differences in the signal levels dictate the values of the mixing parameters. Knowledge of speech production, especially the excitation source characteristics, is used to derive an approximate weight function for locating the regions specific to a given speaker. The scatter plots of the weighted and delay-compensated mixed speech signals are used to estimate the mixing parameters. The proposed method is applied on the data collected in actual laboratory environment for an underdetermined case, where the number of speakers is more than the number of microphones. Enhancement of speech due to a speaker is also examined using the information of the time-delays and the mixing parameters, and is evaluated using objective measures proposed in the literature.

[1]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[2]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[3]  Hiroshi Sawada,et al.  Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors , 2007, Signal Process..

[4]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[5]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[6]  Hiroshi Sawada,et al.  MAP-Based Underdetermined Blind Source Separation of Convolutive Mixtures by Hierarchical Clustering and -Norm Minimization , 2007, EURASIP J. Adv. Signal Process..

[7]  Bayya Yegnanarayana,et al.  Determining Number of Speakers From Multispeaker Speech Signals Using Excitation Source Information , 2007, IEEE Signal Processing Letters.

[8]  S. R. Mahadeva Prasanna,et al.  Speaker localization using excitation source information in speech , 2005, IEEE Transactions on Speech and Audio Processing.

[9]  Hiroshi Sawada,et al.  A Two-Stage Frequency-Domain Blind Source Separation Method for Underdetermined Convolutive Mixtures , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[10]  Paris Smaragdis,et al.  Evaluation of blind signal separation methods , 1999 .

[11]  Hiroshi Sawada,et al.  Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters , 2004, IEEE Transactions on Speech and Audio Processing.

[12]  S. R. Mahadeva Prasanna,et al.  Processing of reverberant speech for time-delay estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[14]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[15]  Scott Rickard,et al.  The DUET Blind Source Separation Algorithm , 2007, Blind Speech Separation.

[16]  Ian S. Burnett,et al.  An analysis of the limitations of blind signal separation application with speech , 2006, Signal Process..

[17]  S.C. Douglas,et al.  Multichannel blind deconvolution and equalization using the natural gradient , 1997, First IEEE Signal Processing Workshop on Signal Processing Advances in Wireless Communications.

[18]  Daniel W. C. Ho,et al.  Underdetermined blind source separation based on sparse representation , 2006, IEEE Transactions on Signal Processing.

[19]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[21]  Xianda Zhang,et al.  A unified method for blind separation of sparse sources with unknown source number , 2006, IEEE Signal Process. Lett..

[22]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[23]  G. H. Yates,et al.  Signal Processing for a Cocktail Party Effect , 1969 .

[24]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[25]  Hiroshi Sawada,et al.  Frequency-Domain Blind Source Separation , 2007, Blind Speech Separation.

[26]  Jean-Francois Cardoso,et al.  Blind signal separation: statistical principles , 1998, Proc. IEEE.

[27]  Terrence J. Sejnowski,et al.  Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.