论文信息 - On speaker adaptive training of artificial neural networks

On speaker adaptive training of artificial neural networks

In the paper we present two techniques improving the recognition accuracy of multilayer perceptron neural networks (MLP ANN) by means of adopting Speaker Adaptive Training. The use of the MLP ANN, usually in combination with the TRAPS parametrization, includes applications in speech recognition tasks, discriminative features production for GMM-HMM and other. In the first SAT experiments, we used the VTLN as a speaker normalization technique. Moreover, we developed a novel speaker normalization technique called Minimum Error Linear Transform (MELT) that resembles the cMLLR/fMLLR method [1] with respect to the possible application either on the model or features. We tested these two methods extensively on telephone speech corpus SpeechDat-East. The results obtained in these experiments suggest that incorporation of SAT into MLP ANN training process is beneficial and depending on the setup leads to significant decrease of phoneme error rate (3 % ‐ 8 % absolute, 12 % ‐ 25 % relative). Index Terms: speaker adaptive training, SAT, TRAPS, VTLN, neural network, phoneme recognition

Jan Zelinka | Ludek Müller | Jan Trmal

[1] Li Lee,et al. A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[2] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[3] Alejandro Acero,et al. Acoustical and environmental robustness in automatic speech recognition , 1991 .

[4] Pavel Matejka,et al. Towards Lower Error Rates in Phoneme Recognition , 2004, TSD.

[5] Hynek Hermansky,et al. Temporal patterns (TRAPs) in ASR of noisy speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6] Hermann Ney,et al. Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[7] Lukás Burget,et al. Investigation into bottle-neck features for meeting speech recognition , 2009, INTERSPEECH.

[8] Mark J. F. Gales,et al. Variance compensation within the MLLR framework , 1996 .