A microphone array system for speech source localization, denoising, and dereverberation

There is a great deal of potential for advancement in distant-talker speech acquisition research, and a wealth of current and future technology depends upon these advances. The goal of this work is to allow users the opportunity to roam unfettered in diverse environments while still providing a high quality speech signal and a robustness to background noise and reverberation effects. In this thesis, a microphone array speech enhancement system is presented which has three main components: source localization, background noise reduction, and dereverberation. The localization algorithm is effective in the presence of both background noise and reverberations and simultaneously produces relative time delay estimates and a source location estimate. It provides a procedure applicable to all time delay estimators which either maximize or minimize an appropriate objective function, improving the estimators' robustness to environmental degradations. The denoising algorithm is a multi-microphone extension to the Minimum Statistics denoising technique [Martin (2001)]. This algorithm also has an additional and optional SNR-dependent beamforming stage that is shown to be very useful in certain environments. The final component is a multi-channel dereverberation algorithm which models the speech source and room reverberations independently. A weighting function is estimated and applied in the Wavelet Transform domain to de-emphasize portions which are less coherent across microphone signals, an indication of reverberation effects. Results for the various components are provided as proof of the effectiveness of the proposed multi-microphone speech enhancement system.

[1]  Shubha Kadambe,et al.  Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[2]  Michael S. Brandstein,et al.  Real-Time Automated Video and Audio Capture with Multiple Cameras and Microphones , 2001, J. VLSI Signal Process..

[3]  Mikio Tohyama,et al.  Source waveform recovery in a reverberant space by cepstrum dereverberation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[5]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[6]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7]  Rodney A. Kennedy,et al.  On the poor robustness of sound equalization in reverberant environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[9]  Maurizio Omologo,et al.  Acoustic source location in a three-dimensional space using crosspower spectrum phase , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Jae S. Lim,et al.  Speech enhancement based on the generalized dual excitation model with adaptive analysis window , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Keun-Sung Bae,et al.  Speech enhancement with reduction of noise components in the wavelet domain , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  H. Strube Determination of the instant of glottal closure from the speech wave. , 1974, The Journal of the Acoustical Society of America.

[13]  Yannick Mahieux,et al.  Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering , 1998, IEEE Trans. Speech Audio Process..

[14]  Hyeong-Ho Lee,et al.  A Study of On-Off Characteristics of Conversational Speech , 1986, IEEE Trans. Commun..

[15]  Zhen Yang,et al.  Improved performance of multimicrophone speech enhancement systems , 1993 .

[16]  A. Gray,et al.  Unconstrained frequency-domain adaptive filter , 1982 .

[17]  Michael S. Brandstein,et al.  Microphone array speech dereverberation using coarse channel modeling , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18]  P. Peterson Simulating the response of multiple microphones to a single acoustic source in a reverberant room. , 1986, The Journal of the Acoustical Society of America.

[19]  Athina P. Petropulu,et al.  Cepstrum-based deconvolution for speech dereverberation , 1996, IEEE Trans. Speech Audio Process..

[20]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[21]  Andrzej Drygajlo,et al.  Combined Wiener and coherence filtering in wavelet domain for microphone array speech enhancement , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[22]  Akihiko Sugiyama,et al.  Robust Adaptive Beamforming , 2001, Microphone Arrays.

[23]  Hong Wang,et al.  Voice source localization for automatic camera pointing system in videoconferencing , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[24]  Michael S. Brandstein,et al.  WAVELET TRANSFORM EXTREMA CLUSTERING FOR MULTI-CHANNEL SPEECH DEREVERBERATION , 1999 .

[25]  Jörg Meyer,et al.  Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[27]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[28]  Eric Moulines,et al.  HNS: Speech modification based on a harmonic+noise model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Dae Hee Youn,et al.  Adaptive phase transform processors for time delay estimation , 1986 .

[30]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[31]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[32]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[33]  A. Wasiljeff,et al.  Adaptive Microphone Arrays for Noise Suppression in the Frequency Domain , 1992 .

[34]  Benoît Champagne,et al.  A microphone array processing technique for speech enhancement in a reverberant space , 1996, Speech Communication.

[35]  Stéphane Mallat,et al.  Characterization of Signals from Multiscale Edges , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .

[37]  Douglas Nelson,et al.  Glottal pulse alignment in voiced speech for pitch determination , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Jont B. Allen,et al.  Invertibility of a room impulse response , 1979 .

[39]  Joerg Bitzer,et al.  Post-Filtering Techniques , 2001, Microphone Arrays.

[40]  Jacob Benesty,et al.  Microphone arrays for video camera steering , 2000 .

[41]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[42]  Satoshi Nakamura,et al.  Localization of multiple sound sources based on a CSP analysis with a microphone array , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[43]  Peter Kabal,et al.  Room speech dereverberation via minimum-phase and all-pass component processing of multi-microphone signals , 1995, IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing. Proceedings.

[44]  Mohammad Hasan Savoji,et al.  A robust algorithm for accurate endpointing of speech signals , 1989, Speech Commun..

[45]  Rainer Martin,et al.  An efficient algorithm to estimate the instantaneous SNR of speech signals , 1993, EUROSPEECH.