Adaptive time-frequency data fusion for speech enhancement

This paper proposes an adaptive time- frequency data fusion technique for the reduction of noise in speech signals using an array of two mi- crophones. Recently, it has been shown that phase- error filtering can be a simple and effective method for "cocktail party" noise removal. However, while this technique is successful when the recorded speech sig- nal is veq noisy (less than lOdB signal-to-noise ratio (SNR)), it also tends to severely degrade the signal at higher SNRs. In this paper, the phase-error filtering technique is extended to dynamically estimate the SNR and adjust the filter parameters accordingly. Simula- tion results show an SNR gain of lldB (with Gaussian noise) and an SNR gain of 17dB (Speech noise) at low SNRs without any signal degradation at higher SNRs. Speaker-independent speech recognition results using 5 speakers show that the proposed algorithm achieves a digit percent accuracy gain of 22% at OdB and 15% at 2OdB.

[1]  Michael S. Brandstein,et al.  A practical time-delay estimator for localizing speech sources with a microphone array , 1995, Comput. Speech Lang..

[2]  John C. Platt,et al.  Networks for the Separation of Sources that Are Superimposed and Delayed , 1991, NIPS.

[3]  Kari Torkkola,et al.  Blind separation of delayed sources based on information maximization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Guangji Shi,et al.  Multi-channel time-frequency data fusion , 2002, Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997).

[5]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[6]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[7]  Kari Torkkola,et al.  Blind separation of convolved sources based on information maximization , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[8]  Te-Won Lee,et al.  Independent Component Analysis , 1998, Springer US.

[9]  Parham Aarabi,et al.  ITERATIVE SPATIAL PROBABILITY BASED SOUND LOCALIZATION , 2000 .

[10]  Parham Aarabi Multi-Sense Artificial Awareness , 1999 .

[11]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[12]  Michael S. Brandstein,et al.  A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Kari Torkkola,et al.  Blind Separation For Audio Signals - Are We There Yet? , 1999 .

[14]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[15]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[16]  Te-Won Lee,et al.  Blind Separation of Delayed and Convolved Sources , 1996, NIPS.