Experiments in syllable-based recognition of continuous speech

An exploratory implementation of a syllable-based recognizer is described. Continuous speech is first divided into syllabic units, and the units are then matched against syllable templates using a dynamic programming algorithm. A hierarchical transition network is used to limit the syllable search to possible continuations of the current partial sentence hypotheses. Competing hypotheses are pruned by a 'beam search'. Experiments are reported on automatic recognition of English sentences with a 70-word vocabulary and restricted syntax produced by one male speaker. 85% of the sentences were correctly recognized. Comparable results were obtained for a similar task in French using a female speaker. The method is computationally efficient: real-time performance on dedicated hardware should be obtainable without difficulty. A method of scaling the distance measures used in the syllable matching is described. This scaling takes into account variability in syllable production, both as a function of position within the syllable and as a function of the various spectral parameters being used.