Speaker adaptation using regularization and network adaptation for hybrid MMI-NN/HMM speech recognition

This paper describes, how to perform speaker adaptation for a hybrid large vocabulary speech recognition system. The hybrid system is based on a Maximum Mutual Information Neural Network (MMINN), which is used as a Vector Quantizer (VQ) for a discrete HMM speech recognizer. The combination of MMINNs and HMMs has shown good performance on several large vocabulary speech recognition tasks like RM and WSJ. This paper now presents two approaches to perform speaker adaptation with this hybrid system. The first approach is a transformation of the feature space, which is performed by a neural network with maximum likelihood (ML) as objective function for the complete system, which means, that the parameters of the NN are estimated in order to match the HMM-parameters of the pre-trained speaker independent system. The second approach is to adapt the HMM parameters depending on the amount of training data available per HMM, using a regularization approach. Both approaches can be applied jointly, which further improves the recognition accuracy.