Recognition of voiced sounds with a continuous state HMM

Many current speech recognition systems use very large statistical models using many thousands, perhaps millions, of parameters to account for variability in speech signals observed in large training corpora, and represent speech as sequences of discrete, independent events. The mechanisms of speech production are, however, conceptually very simple and involve continuous smooth movement of a small number of speech articulators. We report progress towards a practical implementation of a parsimonious continuous state hidden Markov model for recovery of voiced phoneme sequences from trajectories of such continuous, dynamic speech production features, using of the order of several hundred parameters. We describe automated training of the parameters using a forced alignment procedure, and results for training and testing on an individual speaker.

[1]  Martin J. Russell,et al.  Consonant recognition with continuous-state hidden Markov models and perceptually-motivated features , 2015, INTERSPEECH.

[2]  Nasser Kehtarnavaz,et al.  Hidden Gauss-Markov models for signal classification , 2002, IEEE Trans. Signal Process..

[3]  P. Ainsleigh Theory of Continuous-State Hidden Markov Models and Hidden Gauss-Markov Models , 2001 .

[4]  Martin J. Russell,et al.  Trajectory analysis of speech using continuous state hidden Markov Models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  J. Holmes,et al.  Speech Synthesis by Rule , 1964 .

[6]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[8]  Martin J. Russell,et al.  Analysis of a low-dimensional bottleneck neural network representation of speech for modelling speech dynamics , 2015, INTERSPEECH.

[9]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[10]  Abeer Alwan,et al.  A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Colin J. Champion,et al.  Application of continuous state Hidden Markov Models to a classical problem in speech recognition , 2016, Comput. Speech Lang..