Binaural segregation in multisource reverberant environments.

In a natural environment, speech signals are degraded by both reverberation and concurrent noise sources. While human listening is robust under these conditions using only two ears, current two-microphone algorithms perform poorly. The psychological process of figure-ground segregation suggests that the target signal is perceived as a foreground while the remaining stimuli are perceived as a background. Accordingly, the goal is to estimate an ideal time-frequency (T-F) binary mask, which selects the target if it is stronger than the interference in a local T-F unit. In this paper, a binaural segregation system that extracts the reverberant target signal from multisource reverberant mixtures by utilizing only the location information of target source is proposed. The proposed system combines target cancellation through adaptive filtering and a binary decision rule to estimate the ideal T-F binary mask. The main observation in this work is that the target attenuation in a T-F unit resulting from adaptive filtering is correlated with the relative strength of target to mixture. A comprehensive evaluation shows that the proposed system results in large SNR gains. In addition, comparisons using SNR as well as automatic speech recognition measures show that this system outperforms standard two-microphone beamforming approaches and a recent binaural processor.

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[3]  P. N. Denbigh,et al.  A speech separation system that is robust to reverberation , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[4]  A. Bregman Auditory Scene Analysis , 2008 .

[5]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[6]  Peter S Chang,et al.  Exploration of Behavioral, Physiological, and Computational Approaches to Auditory Scene Analysis , 2004 .

[7]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[8]  M. Weiss,et al.  Use of an adaptive noise canceler as an input preprocessor for a hearing aid. , 1987, Journal of rehabilitation research and development.

[9]  E. Lindemann Two microphone nonlinear frequency domain beamformer for hearing aid noise reduction , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[10]  DeLiang Wang,et al.  Binary and ratio time-frequency masks for robust speech recognition , 2006, Speech Commun..

[11]  Dirk Van Compernolle,et al.  Switching adaptive filters for enhancing noisy and reverberant speech from microphone array recordings , 1990, ICASSP.

[12]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[13]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[14]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[15]  Hiroshi Sawada,et al.  Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Allan Kardec Barros,et al.  Estimation of speech embedded in a reverberant and noisy environment by independent component analysis and wavelets , 2002, IEEE Trans. Neural Networks.

[17]  V. Rodellar,et al.  Speech enhancement and source separation supported by negative beamforming filtering , 2002, 6th International Conference on Signal Processing, 2002..

[18]  Jont B. Allen,et al.  Multimicrophone signal‐processing technique to remove room reverberation from speech signals , 1977 .

[19]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[20]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[21]  B. Kollmeier PSYCHOACOUSTICS, SPEECH AND HEARING AIDS: Proceedings of the Summer School and International Symposium , 1996 .

[22]  E. Oja,et al.  Independent Component Analysis , 2013 .

[23]  DeLiang Wang,et al.  Binaural sound segregation for multisource reverberant environments , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Barak A. Pearlmutter,et al.  Blind source separation by sparse decomposition , 2000, SPIE Defense + Commercial Sensing.

[25]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[26]  Guy J. Brown,et al.  Separation of Speech by Computational Auditory Scene Analysis , 2005 .

[27]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[28]  Ning Ma,et al.  Perceptual Kalman filtering for speech enhancement in colored noise , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Guy J. Brown,et al.  A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation , 2004, Speech Commun..

[30]  Bill Gardner,et al.  HRTF Measurements of a KEMAR Dummy-Head Microphone , 1994 .

[31]  M. L. Shire,et al.  Discriminant Training of Front-End and Acoustic Modeling Stages to Heterogeneous Acoustic Environmen , 2000 .

[32]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[33]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[34]  C. Richmond A note on non-Gaussian adaptive array detection and signal parameter estimation , 1996, IEEE Signal Processing Letters.

[35]  B C Wheeler,et al.  A two-microphone dual delay-line approach for extraction of a speech sound in the presence of multiple interferers. , 2001, The Journal of the Acoustical Society of America.

[36]  P. N. Denbigh,et al.  A sound segregation algorithm for reverberant conditions , 2001, Speech Commun..

[37]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[38]  Barak A. Pearlmutter,et al.  Independent Component Analysis: Blind source separation by sparse decomposition in a signal dictionary , 2001 .

[39]  Pierre Divenyi Speech Separation by Humans and Machines , 2004 .

[40]  Rodney A. Kennedy,et al.  Constant Directivity Beamforming , 2001, Microphone Arrays.

[41]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[42]  R. M. Sachs,et al.  Anthropometric manikin for acoustic research. , 1975, The Journal of the Acoustical Society of America.

[43]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[44]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.