论文信息 - A preliminary study on the use of demisyllables in automatic speech recognition

A preliminary study on the use of demisyllables in automatic speech recognition

A speech recognition system is described for recognizing isolated words from reference templates created by concatenating demisyllables from a corpus of about 1000 demisyllables. The composition (in terms of demisyllables) of each reference word is specified in a lexicon with one or more entries for each word of the vocabulary. Experiments were carried out, using a 100-word vocabulary, to investigate the usefulness of such a representation and the effect on performance of some simple modifications in demisyllable specification and durations of reference patterns. Recognition accuracy of 97.6% was obtained using 132 reference templates for the 100-word vocabulary.

[1] N. Umeda,et al. Letter: Effect of speaking mode on temporal factors in speech: vowel duration. , 1974, The Journal of the Acoustical Society of America.

[2] O. Fujimura,et al. Syllable as a unit of speech recognition , 1975 .

[3] P. Mermelstein,et al. A phonetic-context controlled strategy for segmentation and phonetic labeling of speech , 1975 .

[4] N. Umeda. Vowel duration in American English. , 1975, The Journal of the Acoustical Society of America.

[5] F. Itakura,et al. Minimum prediction residual principle applied to speech recognition , 1975 .

[6] Aaron E. Rosenberg,et al. Speaker-independent recognition of isolated words using clustering techniques , 1979 .

[7] L. Rabiner,et al. A simplified, robust training procedure for speaker trained, isolated word recognition systems , 1980 .

[8] C. Myers,et al. A level building dynamic time warping algorithm for connected word recognition , 1981 .