Speech enhancement using voice source models

Autoregressive (AR) models have been shown to be effective models of the human vocal tract during voicing. However the most common model of speech for enhancement purposes, AR process excited by white noise, fails to capture the periodic nature of voiced speech. Speech synthesis researchers have long recognized this problem and have developed a variety of sophisticated excitation models, however these models have yet to make an impact in speech enhancement. We have chosen one of the most common excitation models, the four-parameter LF model of Fant, Liljencrants and Lin (1985), and applied it to the enhancement of individual voiced phonemes. Comparing the performance of the conventional white-noise-driven AR, an impulsive-driven AR, and AR based on the LF model shows that the LF model yields a substantial improvement, on the order of 1.3 dB.

[1]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[2]  Tony T. Lee A direct approach to identify the noise covariances of Kalman filtering , 1980 .

[3]  Jerry D. Gibson,et al.  Filtering of colored noise for speech enhancement and coding , 1991, IEEE Trans. Signal Process..

[4]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[6]  B. Atal,et al.  Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.

[7]  Jae S. Lim,et al.  Speech enhancement , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Michael A. Malcolm,et al.  Computer methods for mathematical computations , 1977 .

[9]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[10]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[12]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[13]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[14]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[15]  Hideki Kasuya,et al.  Simultaneous estimation of vocal tract and voice source parameters with application to speech synthesis , 1994, ICSLP.

[16]  Mari Ostendorf,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[17]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[18]  Ingo Titze,et al.  A four-parameter model of the glottis and vocal fold contact area , 1989, Speech Commun..

[19]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[20]  John N. Holmes,et al.  Formant excitation before and after glottal closure , 1976, ICASSP.

[21]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[22]  Hossein Sameti,et al.  Model-based approaches to speech enhancement: stationary-state and nonstationary-state hmms , 1994 .

[23]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[24]  Waveforms Hisashi Wakita Direct Estimation of the Vocal Tract Shape by Inverse Filtering of Acoustic Speech , 1973 .

[25]  R. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[26]  I. Duff,et al.  The state of the art in numerical analysis , 1997 .

[27]  Yariv Ephraim,et al.  A minimum mean square error approach for speech enhancement , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[28]  M. Niranjan,et al.  An hmm-based cepstral-domain speech enhancement system , 1994, ICSLP.

[29]  Saeed Vaseghi,et al.  Advanced Signal Processing and Digital Noise Reduction , 1996 .

[30]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[31]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[32]  R. Mehra On the identification of variances and adaptive Kalman filtering , 1970 .

[33]  Edward J. Beltrami,et al.  An Algorithmic Approach to Nonlinear Analysis and Optimization , 1973 .

[34]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[35]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[36]  Li Deng Integrated optimization of dynamic feature parameters for hidden Markov modeling of speech , 1994, IEEE Signal Processing Letters.

[37]  Byung-Gook Lee,et al.  An EM-based approach for parameter enhancement with an application to speech signals , 1995, Signal Process..

[38]  Yariv Ephraim On minimum mean square error speech enhancement , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[39]  Raymond D. Kent,et al.  Acoustic Analysis of Speech , 2009 .

[40]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[41]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[42]  Anthony J. Robinson,et al.  Enhancement and recognition of noisy speech within an autoregressive hidden Markov model framework using noise estimates from the noisy signal , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[44]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[45]  Gunnar Fant,et al.  The source filter concept in voice production , 1981 .

[46]  Hideki Kasuya,et al.  A new speech synthesis system based on the ARX speech production model , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[47]  Jae Lim,et al.  Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise , 1978 .

[48]  Björn E. Ottersten,et al.  Kalman filtering for low distortion speech enhancement in mobile communication , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[49]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[50]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[51]  Saeed Vaseghi,et al.  Noise compensation methods for hidden Markov model speech recognition in adverse environments , 1997, IEEE Trans. Speech Audio Process..

[52]  Ehud Weinstein,et al.  Iterative-batch and sequential algorithms for single microphone speech enhancement , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53]  S. Godbole,et al.  Kalman filtering with no A-priori information about noise-White noise case: Part I: Identification of covariances , 1973, CDC 1973.

[54]  Ki Yong Lee,et al.  Efficient recursive estimation for speech enhancement in colored noise , 1996, IEEE Signal Processing Letters.

[55]  Allen Gersho,et al.  Advances in speech coding , 1991 .

[56]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[57]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[58]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[59]  Paavo Alku,et al.  Estimation of the glottal pulseform based on discrete all-pole modeling , 1994, ICSLP.

[60]  O. L. Mangasrian Techniques of Optimization , 1972 .

[61]  Peter F. Driessen,et al.  Speech enhancement based on Kalman filtering and EM algorithm , 1991, [1991] IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings.

[62]  Sheldon M. Ross Introduction to Probability Models. , 1995 .

[63]  Kuldip K. Paliwal,et al.  A speech enhancement method based on Kalman filtering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[64]  Thomas Kailath,et al.  A view of three decades of linear filtering theory , 1974, IEEE Trans. Inf. Theory.

[65]  T. V. Ananthapadmanabha,et al.  Calculation of true glottal flow and its components , 1982, Speech Commun..

[66]  A. Gray,et al.  On autocorrelation equations as applied to speech analysis , 1973 .

[67]  Lou Boves,et al.  Fitting a LF-model to inverse filter signals , 1993, EUROSPEECH.

[68]  J. Mendel Lessons in Estimation Theory for Signal Processing, Communications, and Control , 1995 .