An exploratory implementation of a syllable-based recognizer is described. Continuous speech is first divided into syllabic units, and the units are then matched against syllable templates using a dynamic programming algorithm. A hierarchical transition network is used to limit the syllable search to possible continuations of the current partial sentence hypotheses. Competing hypotheses are pruned by a 'beam search'. Experiments are reported on automatic recognition of English sentences with a 70-word vocabulary and restricted syntax produced by one male speaker. 85% of the sentences were correctly recognized. Comparable results were obtained for a similar task in French using a female speaker. The method is computationally efficient: real-time performance on dedicated hardware should be obtainable without difficulty. A method of scaling the distance measures used in the syllable matching is described. This scaling takes into account variability in syllable production, both as a function of position within the syllable and as a function of the various spectral parameters being used.
[1]
O. Fujimura,et al.
Syllable as a unit of speech recognition
,
1975
.
[2]
P. Mermelstein,et al.
A phonetic-context controlled strategy for segmentation and phonetic labeling of speech
,
1975
.
[3]
P. Mermelstein.
Automatic segmentation of speech into syllabic units.
,
1975,
The Journal of the Acoustical Society of America.
[4]
F. Jelinek,et al.
Continuous speech recognition by statistical methods
,
1976,
Proceedings of the IEEE.
[5]
S. B. Davis,et al.
Evaluation of acoustic parameters for monosyllabic word identification
,
1978
.
[6]
E Paulus,et al.
Automatic speech recognition using psychoacoustic models.
,
1979,
The Journal of the Acoustical Society of America.
[7]
M. Hunt.
A statistical approach to metrics for word and syllable recognition
,
1979
.