Spatial efficiency of blind source separation based on decorrelation - subjective and objective assessment

Blind source separation (BSS) method is one of the newest multisensorial methods that exploits statistical properties of simultaneously recorded independent signals to separate them out. The objective of this method is similar to that of beamforming, namely a set of spatial filters that separate source signals are calculated. Thus, it seems to be reasonable to investigate the spatial efficiency of BSS that is reported in this study. A dummy head with two microphones was used to record two signals in an anechoic chamber: target speech and babble noise in different spatial configurations. Then the speech reception thresholds (SRTs, i.e. signal-to-noise ratio, SNR yielding 50% speech intelligibility) before and after BSS algorithm (Parra and Spence, 2000) were determined for audiologically normal subjects. A significant speech intelligibility improvement was noticed after the BSS was applied. This happened in most cases when the target and masker sources were spatially separated. Moreover, the comparison of objective (SNR enhancement) and subjective (intelligibility improvement) assessment methods is reported here. It must be emphasized that these measures give different results.

[1]  Matti Hämäläinen,et al.  Filter-and-sum beamformer with adjustable filter characteristics , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Hiroshi Sawada,et al.  Frequency-Domain Blind Source Separation , 2007, Blind Speech Separation.

[3]  Aleksander Sek,et al.  Polish sentence tests for measuring the intelligibility of speech in interfering noise , 2009, International journal of audiology.

[4]  Jean-Francois Cardoso,et al.  Eigen-structure of the fourth-order cumulant tensor with application to the blind source separation problem , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Birger Kollmeier,et al.  Amplitude Modulation Decorrelation For Convolutive Blind Source Separation , 2000 .

[6]  Paris Smaragdis,et al.  Information theoretic approaches to source separation , 1997 .

[7]  Szymon Drgas,et al.  Logatom articulation index evaluation of speech enhanced by blind sourceseparation and single-channel noise reduction , 2008 .

[8]  Jedrzej Kocinski,et al.  Speech intelligibility improvement using convolutive blind source separation assisted by denoising algorithms , 2008, Speech Commun..

[9]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[10]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[11]  Lucas C. Parra,et al.  Steerable frequency-invariant beamforming for arbitrary arrays , 2006 .

[12]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[13]  Shoko Araki,et al.  Equivalence between Frequency-Domain Blind Source Separation and Frequency-Domain Adaptive Beamforming for Convolutive Mixtures , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  H. Levitt Transformed up-down methods in psychoacoustics. , 1971, The Journal of the Acoustical Society of America.

[15]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[16]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[17]  John H. L. Hansen,et al.  A speech presence microphone array beamformer using model based speech presence probability estimation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[19]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[20]  Chang D. Yoo,et al.  Wavelet speech enhancement based on voiced/unvoiced decision , 2003 .

[21]  Scott C. Douglas,et al.  Convolutive blind separation of speech mixtures using the natural gradient , 2003, Speech Commun..

[22]  Aleksander Sek,et al.  Speech intelligibility in various spatial configurations of backgroundnoise , 2005 .

[23]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[24]  B G Shinn-Cunningham,et al.  Spatial unmasking of nearby speech sources in a simulated anechoic environment. , 2001, The Journal of the Acoustical Society of America.

[25]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[26]  Christine Serviere,et al.  BLIND SEPARATION OF CONVOLUTIVE AUDIO MIXTURES USING NONSTATIONARITY , 2003 .

[27]  Hiroshi Sawada,et al.  Frequency domain blind source separation using small and large spacing sensor pairs , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[28]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[29]  Yi Zhou,et al.  Blind source separation in frequency domain , 2003, Signal Process..

[31]  Hiroshi Sawada,et al.  Blind Source Separation of Convolutive Mixtures of Speech in Frequency Domain , 2005, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[32]  Andrzej Czyzewski,et al.  Contactless hearing aid designed for infants , 2006 .

[33]  Terrence J. Sejnowski,et al.  Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Kiyotoshi Matsuoka,et al.  A neural net for blind separation of nonstationary signals , 1995, Neural Networks.

[35]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[36]  Guo Wei,et al.  Convolutive Blind Source Separation of Non-stationary Source , 2011 .

[37]  Kiyohiro Shikano,et al.  SPEECH ENHANCEMENT AND RECOGNITION IN CAR ENVIRONMENT USING BLIND SOURCE SEPARATION AND SUBBAND ELIMINATION PROCESSING , 2003 .

[38]  B Kollmeier,et al.  Directivity of binaural noise reduction in spatial multiple noise-source arrangements for normal and impaired listeners. , 1997, The Journal of the Acoustical Society of America.

[39]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[40]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[41]  Don H. Johnson,et al.  Array Signal Processing: Concepts and Techniques , 1993 .

[42]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[43]  Takeshi Yamada,et al.  Subjective and Objective Quality Assessment for Noise Reduced Speech , 2007 .

[44]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[45]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Barbara G. Shinn-Cunningham,et al.  Spatial unmasking of nearby speech sources in a simulated anechoic environment , 2000 .

[47]  Kostas Kokkinakis,et al.  Using blind source separation techniques to improve speech recognition in bilateral cochlear implant patients. , 2008, The Journal of the Acoustical Society of America.