A syllable-based isolated word recognition experiment

In view of the automatic recognition of a very large or, eventually, unlimited vocabulary, it is necessary to choose recognition units that are smaller than word size. A series of experiments has been carried out to evaluate the use of the syllable as a concatenative unit for large vocabularies. For isolated word recognition from syllable units, a reference pattern was created for each word of the lexicon by concatenating isolated syllable templates. The test utterance is then matched to each word template through the use of dynamic programming. To build the word reference pattern, the syllable templates were adjusted by using the VLTS procedure and down sampling the beginning and end parts of the syllable. The syllable approach was compared to the classical whole word one on the same 10,400-word vocabulary. The storage required for the syllable dictionary is one sixth of that necessary for the whole word dictionary. For a trained speaker, the recognition error rate obtained using the syllable approach was 12% compared to 6% using the whole word approach. This difference may be reduced by using syllable templates extracted from words to take coarticulation effects between syllables into account.

[1]  Michael D. Brown,et al.  An algorithm for connected word recognition , 1982, ICASSP.

[2]  Jean-Luc Gauvain,et al.  On the use of time compression for word-based recognition , 1983, ICASSP.

[3]  L R Rabiner,et al.  On duration and smoothing rules in a demisyllable-based isolated-word recognition system. , 1984, The Journal of the Acoustical Society of America.

[4]  Paul Mermelstein,et al.  Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[5]  Günther Ruske,et al.  The efficiency of demisyllable segmentation in the recognition of spoken words , 1981, ICASSP.

[6]  J.-L. Gauvain,et al.  A dynamic time warp VLSI processors for continuous speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Hermann Ney,et al.  Fast nonlinear time alignment for isolated word recognition , 1981, ICASSP.

[8]  T. K. Vintsyuk Element-wise recognition of continuous speech composed of words from a specified dictionary , 1971, CYBERNETICS.

[9]  Tohru Ueda,et al.  A study of the syllable oriented recognition of continuous speech , 1983, Speech Commun..

[10]  N. Dixon,et al.  A hierarchical decision approach to large-vocabulary discrete utterance recognition , 1983 .

[11]  Lawrence R. Rabiner,et al.  Connected digit recognition using a level-building DTW algorithm , 1981 .

[12]  Jean-Luc Gauvain,et al.  Evaluation of time compression for connected word recognition , 1984, ICASSP.

[13]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .