Recognition of isolated words by encoding speech into linear predictive coefficients (LPC) is well known and widely accepted as one of the better methods for speech recognition. One of the drawbacks in relying entirely on LPC measures for recognition, however, is that the energy information in the speech is removed during the LPC analysis. Consequently, attempts have been made to include energy pattern information along with the LPC pattern information to achieve greater recognition accuracy. This paper discusses problems involved in combining energy pattern information with the LPC pattern information and presents results of recognition experiments with one method. The energy information and LPC information are combined linearly in a (speech) frame-by-frame manner utilizing the dynamic time warping (DTW) method time alignment. The LPC log likelihood ratio distance function, which determines the spectral difference between two frames of speech, does not lend itself to direct statistical analysis in multiple dimensions. The method for obtaining the weighting for the linear combination involves an iterative minimization of a probability of error function. The combined energy and LPC distance function was tested using a 129-word “airline” vocabulary, which is designed for speaker-independent, isolated word recognition. The inclusion of energy information in the recognition feature space reduces recognition error rates by an average of about 25 percent as compared with LPC alone.
[1]
S. Chiba,et al.
Dynamic programming algorithm optimization for spoken word recognition
,
1978
.
[2]
A. Gray,et al.
Distance measures for speech processing
,
1976
.
[3]
Nils J. Nilsson,et al.
Learning Machines: Foundations of Trainable Pattern-Classifying Systems
,
1965
.
[4]
S. Levinson,et al.
Considerations in dynamic time warping algorithms for discrete word recognition
,
1978
.
[5]
George S. Sebestyen,et al.
Decision-making processes in pattern recognition
,
1962
.
[6]
G. W. Hughes,et al.
Minimum Prediction Residual Principle Applied to Speech Recognition
,
1975
.
[7]
G. White,et al.
Speech recognition experiments with linear predication, bandpass filtering, and dynamic programming
,
1976
.
[8]
Aaron E. Rosenberg,et al.
Considerations in dynamic time warping algorithms for discrete word recognition
,
1978
.
[9]
Aaron E. Rosenberg,et al.
Speaker independent recognition of isolated words using clustering techniques
,
1979,
ICASSP.
[10]
Aaron E. Rosenberg,et al.
Performance tradeoffs in dynamic time warping algorithms for isolated word recognition
,
1980
.
[11]
Aaron E. Rosenberg,et al.
Speaker-independent recognition of isolated words using clustering techniques
,
1979
.