Speaker adaptive speech recognition using phone pair model

In acoustic feature space, all phone positions suffer rather large shifts across speakers. Even within a speaker they show rather large utterance-to-utterance variations. However, their relative positions are considered to be relatively stable. In this paper, we propose a phone pair model, which utilizes the joint probability of two phones' acoustic feature vectors, to represent relationships between phone positions. Two recognition experiments, one using phone HMM only and the other incorporating phone HMM with the phone pair model, indicated that the efficient speaker adaptation was possible by the phone pair model. Experiments also showed the robustness of the model in speech recognition.