A new speech enhancement technique with application to speaker identification

Speaker identification (SI) and speech recognition systems display a strong sensitivity to ambient noise conditions. It is known that the performance of these systems is maximized when trained under conditions that approximate the testing environment. However these systems generally do not perform well when there is no noise or when the level or characteristics of the noise changes. A new speech enhancement technique is presented that, when used with a standard cepstrum-based SI system, significantly reduces the sensitivity of the SI system to mismatches in the noise environment between training and testing.<<ETX>>

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  David Malah,et al.  Estimation of the parameters of a long-term model for accurate representation of voiced speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Nancy Hubing,et al.  Dynamic time warping comb filter for the enhancement of speech degraded by white Gaussian noise , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Boualem Boashash,et al.  Time-Frequency Signal Analysis: Methods and Applications. , 1993 .

[5]  David Malah,et al.  Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  R. J. Mammone,et al.  New speech enhancement techniques using the pitch mode modulation model , 1993, Proceedings of 36th Midwest Symposium on Circuits and Systems.

[7]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[8]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[9]  James L. Flanagan,et al.  Speech recognition using the modulation model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.