Cepstral domain talker stress compensation for robust speech recognition

A study of talker-stress-induced intraword variability and an algorithm that compensates for the systematic changes observed are presented. The study is based on hidden Markov models trained by speech tokens spoken in various talking styles. The talking styles include normal speech, fast speech, loud speech, soft speech, and taking with noise injected through earphones; the styles are designed to simulate speech produced under real stressful conditions. Cepstral coefficients are used as the parameters in the hidden Markov models. The stress compensation algorithm compensates for the variations in the cepstral coefficients in a hypothesis-driven manner. The functional form of the compensation is shown to correspond to the equalization of spectral tilts. Substantial reduction of error rates has been achieved when the cepstral domain compensation techniques were tested on the simulated-stress speech database. The hypothesis-driven compensation technique reduced the average error rate from 13.9% to 6.2%. When a more sophisticated recognizer was used, it reduced the error rate from 2.5% to 1.9%. >

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  H. Lane,et al.  The Lombard Sign and the Role of Hearing in Speech , 1971 .

[3]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[4]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[5]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[6]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[7]  J. Pearl,et al.  Comparison of the cosine and Fourier transforms of Markov-1 signals , 1976 .

[8]  Simonov Pv,et al.  Analysis of the human voice as a method of controlling emotional state: achievements and goals. , 1977 .

[9]  G. R. Doddington,et al.  Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands today , 1981, IEEE Spectrum.

[10]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[11]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[12]  L. Streeter,et al.  Acoustic and perceptual indicators of emotional stress. , 1983, The Journal of the Acoustical Society of America.

[13]  David B. Pisoni,et al.  Some acoustic-phonetic correlates of speech produced in noise , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  George R. Doddington,et al.  Recognition of speech under stress and in noise , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  R. Lippmann,et al.  Multi‐style training for robust speech recognition under stress , 1986 .

[16]  D. B. Paul A speaker-stress resistant HMM isolated word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.