Perceptually-based features in ASR

Perceptually-based linear predictive (PLP) speech analysis, as proposed by Hermansky 1985, can have marked benefits in ASR (automatic speech recognition) systems. Four psychoacoustic factors are considered in PLP analysis, namely critical-band, masking effect, equal-loudness and intensity-loudness law. This paper presents experimental results aimed at illustrating the relative importance of each of these in the context of ASR. It is shown that the (J) SRU filter bank can be incorporated into the PLP process with very similar overall results. The ASR system is based on dynamic time warping (DTW), and a vocabulary consisting of the alphabet and zero-through-nine is used for tests.