An MLP/HMM hybrid model using nonlinear predictors

Abstract In this paper, we propose an MLP/HMM hybrid model in which the input feature vectors are transformed by nonlinear predictors using multilayer perceptrons (MLPs) assigned to each state of a Hidden Markov Model (HMM). The prediction error vectors in the states are modeled by Gaussian mixture densities. The use of a hybrid model is motivated from the need to model the prediction errors in the conventional neural prediction model (NPM) where the prediction errors are variable due to the effect of varying contexts and speaker identity. The MLP/HMM hybrid model is advantageous because frame-correlation in the input speech signal is exploited by employing the MLP predictors, and the variabilities in the prediction error signals are explicitly modeled. We present the training algorithms based on the maximum likelihood (ML) criterion and discriminative criterion for minimum error classification. Experiments were done on speaker-independent continuous speech recognition. By ML training of the hybrid model, we obtained a much better performance than a conventional NPM which does not explicitly model the prediction error signals. By training with the discriminative criterion, confusion among different models was significantly reduced and word error rate was reduced by 56% compared with the ML training.