论文信息 - A microphone array system for speech source localization, denoising, and dereverberation

A microphone array system for speech source localization, denoising, and dereverberation

There is a great deal of potential for advancement in distant-talker speech acquisition research, and a wealth of current and future technology depends upon these advances. The goal of this work is to allow users the opportunity to roam unfettered in diverse environments while still providing a high quality speech signal and a robustness to background noise and reverberation effects. In this thesis, a microphone array speech enhancement system is presented which has three main components: source localization, background noise reduction, and dereverberation. The localization algorithm is effective in the presence of both background noise and reverberations and simultaneously produces relative time delay estimates and a source location estimate. It provides a procedure applicable to all time delay estimators which either maximize or minimize an appropriate objective function, improving the estimators' robustness to environmental degradations. The denoising algorithm is a multi-microphone extension to the Minimum Statistics denoising technique [Martin (2001)]. This algorithm also has an additional and optional SNR-dependent beamforming stage that is shown to be very useful in certain environments. The final component is a multi-channel dereverberation algorithm which models the speech source and room reverberations independently. A weighting function is estimated and applied in the Wavelet Transform domain to de-emphasize portions which are less coherent across microphone signals, an indication of reverberation effects. Results for the various components are provided as proof of the effectiveness of the proposed multi-microphone speech enhancement system.

Michael S. Brandstein | Scott M. Griebel | M. Brandstein | S. Griebel

[1] Shubha Kadambe,et al. Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[2] Michael S. Brandstein,et al. Real-Time Automated Video and Audio Capture with Multiple Cameras and Microphones , 2001, J. VLSI Signal Process..

[3] Mikio Tohyama,et al. Source waveform recovery in a reverberant space by cepstrum dereverberation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Alan V. Oppenheim,et al. All-pole modeling of degraded speech , 1978 .

[5] Rainer Martin,et al. Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[6] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7] Rodney A. Kennedy,et al. On the poor robustness of sound equalization in reverberant environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[9] Maurizio Omologo,et al. Acoustic source location in a three-dimensional space using crosspower spectrum phase , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Jae S. Lim,et al. Speech enhancement based on the generalized dual excitation model with adaptive analysis window , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11] Keun-Sung Bae,et al. Speech enhancement with reduction of noise components in the wavelet domain , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] H. Strube. Determination of the instant of glottal closure from the speech wave. , 1974, The Journal of the Acoustical Society of America.

[13] Yannick Mahieux,et al. Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering , 1998, IEEE Trans. Speech Audio Process..

[14] Hyeong-Ho Lee,et al. A Study of On-Off Characteristics of Conversational Speech , 1986, IEEE Trans. Commun..

[15] Zhen Yang,et al. Improved performance of multimicrophone speech enhancement systems , 1993 .

[16] A. Gray,et al. Unconstrained frequency-domain adaptive filter , 1982 .

[17] Michael S. Brandstein,et al. Microphone array speech dereverberation using coarse channel modeling , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18] P. Peterson. Simulating the response of multiple microphones to a single acoustic source in a reverberant room. , 1986, The Journal of the Acoustical Society of America.

[19] Athina P. Petropulu,et al. Cepstrum-based deconvolution for speech dereverberation , 1996, IEEE Trans. Speech Audio Process..

[20] R. McAulay,et al. Speech enhancement using a soft-decision noise suppression filter , 1980 .

[21] Andrzej Drygajlo,et al. Combined Wiener and coherence filtering in wavelet domain for microphone array speech enhancement , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[22] Akihiko Sugiyama,et al. Robust Adaptive Beamforming , 2001, Microphone Arrays.

[23] Hong Wang,et al. Voice source localization for automatic camera pointing system in videoconferencing , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[24] Michael S. Brandstein,et al. WAVELET TRANSFORM EXTREMA CLUSTERING FOR MULTI-CHANNEL SPEECH DEREVERBERATION , 1999 .

[25] Jörg Meyer,et al. Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .

[27] John Mason,et al. Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[28] Eric Moulines,et al. HNS: Speech modification based on a harmonic+noise model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29] Dae Hee Youn,et al. Adaptive phase transform processors for time delay estimation , 1986 .

[30] Emile H. L. Aarts,et al. Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[31] Andrew Sekey,et al. An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[32] Alan V. Oppenheim,et al. Discrete-time Signal Processing. Vol.2 , 2001 .

[33] A. Wasiljeff,et al. Adaptive Microphone Arrays for Noise Suppression in the Frequency Domain , 1992 .

[34] Benoît Champagne,et al. A microphone array processing technique for speech enhancement in a reverberant space , 1996, Speech Communication.

[35] Stéphane Mallat,et al. Characterization of Signals from Multiscale Edges , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[36] Rainer Martin,et al. Spectral Subtraction Based on Minimum Statistics , 2001 .

[37] Douglas Nelson,et al. Glottal pulse alignment in voiced speech for pitch determination , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38] Jont B. Allen,et al. Invertibility of a room impulse response , 1979 .

[39] Joerg Bitzer,et al. Post-Filtering Techniques , 2001, Microphone Arrays.

[40] Jacob Benesty,et al. Microphone arrays for video camera steering , 2000 .

[41] M.G. Bellanger,et al. Digital processing of speech signals , 1980, Proceedings of the IEEE.

[42] Satoshi Nakamura,et al. Localization of multiple sound sources based on a CSP analysis with a microphone array , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[43] Peter Kabal,et al. Room speech dereverberation via minimum-phase and all-pass component processing of multi-microphone signals , 1995, IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing. Proceedings.

[44] Mohammad Hasan Savoji,et al. A robust algorithm for accurate endpointing of speech signals , 1989, Speech Commun..

[45] Rainer Martin,et al. An efficient algorithm to estimate the instantaneous SNR of speech signals , 1993, EUROSPEECH.