Speaker selection training for large vocabulary continuous speech recognition

Acoustic variability across speakers is one of the challenges of speaker independent (SI) speech recognition systems. As a powerful solution, dominant speaker adaptation technologies such as MLLR and MAP may become inefficient because of the lack of enough enrollment data. In this paper, we propose an adaptation method based on speaker selection training, which makes full use of statistics of training corpus. Relative error rate reduction of 5.31 % is achieved when only one utterance is available. We compare different speaker selection strategies, namely. PCA, HMM and GMM based methods. In addition, impacts of number of selected cohort speakers and number of utterances from target speaker are investigated. Furthermore, comparison and integration with MLLR adaptation are also shown. Finally, some ongoing work such as dynamicalJy varying number of selected speakers, measuring the relative contribution among the selected speakers and speeding up the computationally expensive procedure of re-estimation with model synthesis are also discussed.