Connected sentence recognition using diphone-like templates

A template-based connected speech recognition system which represents words as sequences of diphone-like segments has been implemented and tested on a database of 50 phonetically balanced sentences uttered 5 times by a single male talker. The sentences contain 250 words, of which, 80% are monosyllabic. The inventory of segments is divided into two principal classes, single phone segments, such as vowels, nasals, fricatives, and stop bursts, and diphone segments including consonant-vowel, vowel-consonant, and consonant-consonant combinations. Words are represented by network models whose nodes are these segments. Word models incorporate juncture branches to and from other words. 400 segments are required to represent the 250 vocabulary words. Templates representing these segments are extracted from a database of 450 training sentences uttered by the same talker. Recognition is carried out by a series of matching and search processes, successively for segments, words, word strings, and sentences. The performance obtained to data has yielded 63% correct recognition of content words and approximately 30% recognition of function words.<<ETX>>

[1]  R. Pieraccini,et al.  Definition and evaluation of phonetic units for speech recognition by hidden Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  R. Nakatsu,et al.  Japanese text input system based on continuous speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Takao Watanabe Syllable recognition for continuous Japanese speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  A. Colla,et al.  Unsupervised bootstrapping of diphone-like templates for connected speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[6]  Aaron E. Rosenberg,et al.  Demisyllable-based isolated word recognition system , 1983 .

[7]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  Aaron E. Rosenberg,et al.  A connected speech recognition system based on spotting diphone-like segments--Preliminary results , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Carlo Scagliola,et al.  A connected speech recognition system using a diphone-based language model , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.