An investigation of PLP and IMELDA acoustic representations and of their potential for combination

Two acoustic representations, integrated Mel-scale representation with LDA (IMELDA) and perceptual linear prediction-root power sums (PLP-RPS), both of which have given good results in speech recognition tests, are explored. IMELDA is examined in the context of some related representations. Results of speaker-dependent and independent tests with digits and the alphabet suggest that the optimum PLP order is high and that the effectiveness of PLP-RPS stems not from its modeling of perceptual properties but from its approximation to a desirable statistical property attained exactly by IMELDA. A combined PLP-IMELDA representation is found to be generally more effective than PLP-RPS, but an IMELDA representation derived directly from a filter-bank provides similar results to PLP-IMELDA at a lower computational cost.<<ETX>>

[1]  H. Hermansky,et al.  Optimization of perceptually-based ASR front-end (automatic speech recognition) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hynek Hermansky,et al.  Perceptually based linear predictive analysis of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Hynek Hermansky,et al.  The effective second formant F2' and the vocal tract front-cavity , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  C. Lefebvre,et al.  A comparison of several acoustic representations for speech recognition with degraded and undegraded speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[5]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  M. Hunt,et al.  Speaker dependent and independent speech recognition experiments with an auditory model , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  H. Hermansky,et al.  An efficient speaker-independent automatic speech recognition by simulation of some properties of human auditory perception , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.