A computer system is described in which isolated words, spoken by a designated talker, are recognized through calculation of a minimum prediction residual. A reference pattern for each word to be recognized is stored as a time pattern of linear prediction coefficients (LPC). The total log prediction residual of an input signal is minimized by optimally registering the reference LPC onto the input autocorrelation coefficients using the dynamic programming algorithm (DP). The input signal is recognized as the reference word which produces the minimum prediction residual. A sequential decision procedure is used to reduce the amount of computation in DP. A frequency normalization with respect to the long-time spectral distribution is used to reduce effects of variations in the frequency response of telephone connections. The system has been implemented on a DDP-516 computer for the 200-word recognition experiment. The recognition rate for a designated male talker is 97.3 percent for telephone input, and the recognition time is about 22 times real time.
[1]
A. Wald,et al.
On the Statistical Treatment of Linear Stochastic Difference Equations
,
1943
.
[2]
F. Itakura,et al.
Statistically Optimum Discrimination of Speech Spectra
,
1967
.
[3]
D. Reddy.
Segment‐Synchronization Problem in Speech Recognition
,
1969
.
[4]
F. Itakura,et al.
A statistical method for estimation of speech spectral density and formant frequencies
,
1970
.
[5]
N. G. Zagoruyko,et al.
Automatic recognition of 200 words
,
1970
.
[6]
Hiroaki Sakoe,et al.
A Dynamic Programming Approach to Continuous Speech Recognition
,
1971
.
[7]
B. Atal,et al.
Speech analysis and synthesis by linear prediction of the speech wave.
,
1971,
The Journal of the Acoustical Society of America.
[8]
K. Nakata,et al.
Evaluation of various parameter sets in spoken digits recognition
,
1973
.