Formant tracking using hidden Markov models and vector quantization

This paper describes an approach to formant tracking based on hidden Markov models and vector quantization of LPC spectra. Two general classes of models are developed, differing in whether formants are tracked singly or jointly. The states of a single-formant model are scalar values corresponding to possible formant frequencies. The states of a multiformant model are frequency vectors defining possible formant configurations. Formant detection and estimation are performed simultaneously using the forward-backward algorithm. Model parameters are estimated from handmarked formant tracks. The models have been evaluated using portions of the Texas Instruments multidialect connected digits database. The most accurate configurations exhibited root-mean-square estimation errors of about 70, 95, and 140 HZ, for F 1 , F 2 , and F 3 , respectively.

[1]  S. McCandless,et al.  An algorithm for automatic formant extraction using linear prediction spectra , 1974 .

[2]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[3]  R. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[4]  Gary E. Kopec,et al.  Network-based isolated digit recognition using vector quantization , 1985, IEEE Trans. Acoust. Speech Signal Process..

[5]  S. Seneff Modifications to formant tracking algorithm of april 1974 , 1976 .

[6]  S Hawkins,et al.  Acoustic and perceptual correlates of the non-nasal--nasal distinction for vowels. , 1985, The Journal of the Acoustical Society of America.

[7]  Russell J. Niederjohn,et al.  A zero-crossing consistency method for formant tracking of voiced speech in high noise levels , 1985, IEEE Trans. Acoust. Speech Signal Process..

[8]  G. Kopec,et al.  The integrated signal processing system ISP , 1984, ICASSP.

[9]  Hermann Ney,et al.  Dynamic programming algorithm for optimal estimation of speech parameter contours , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[11]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[12]  John E. Shore,et al.  Discrete utterance speech recognition without time alignment , 1983, IEEE Trans. Inf. Theory.

[13]  A. Jaffer,et al.  Improved detection and tracking of dynamic signals by Bayes-Markov techniques , 1983, ICASSP.

[14]  Michael Wagner Formant extraction algorithm in error , 1982 .

[15]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[17]  G. Kopec Formant tracking using hidden Markov models , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.