Adaptation of Hybrid ANN/HMM Models Using Linear Hidden Transformations and Conservative Training

A technique is proposed for the adaptation of automatic speech recognition systems using hybrid models combining artificial neural networks with hidden Markov models. The application of linear transformations not only to the input features, but also to the outputs of the internal layers is investigated. The motivation is that the outputs of an internal layer represent a projection of the input pattern into a space where it should be easier to learn the classification or transformation expected at the output of the network. A new solution, called conservative training, is proposed that compensates for the lack of adaptation samples in certain classes. Supervised adaptation experiments with different corpora and for different adaptation types are described. The results show that the proposed approach always outperforms the use of transformations in the feature space and yields even better results when combined with linear input transformations

[1]  R. French,et al.  Catastrophic Forgetting in Connectionist Networks: Causes, Consequences and Solutions , 1994 .

[2]  Jonathan G. Fiscus,et al.  Benchmark Tests for the DARPA Spoken Language Program , 1993, HLT.

[3]  Qiang Huo,et al.  On adaptive decision rules and decision parameter adaptation for automatic speech recognition , 2000, Proceedings of the IEEE.

[4]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[5]  Stéphane Dupont,et al.  Fast speaker adaptation of artificial neural networks for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[7]  Gerhard Rigoll,et al.  Two-stage speaker adaptation of hybrid tied-posterior acoustic models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Roberto Gemello,et al.  Hybrid HMM-NN modeling of stationary-transitional units for continuous speech recognition , 2000, Inf. Sci..

[9]  S. Sagayama,et al.  Analytic Methods for Acoustic Model Adaptation : A Review , 2001 .

[10]  Horacio Franco,et al.  Connectionist speaker normalization and adaptation , 1995, EUROSPEECH.

[11]  Jean-Marc Boite,et al.  A study of implicit and explicit modeling of coarticulation and pronunciation variation , 2005, INTERSPEECH.

[12]  Mark J. F. Gales,et al.  Model complexity control and compression using discriminative growth functions , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[14]  Ciro Martins,et al.  Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system , 1995, EUROSPEECH.

[15]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[16]  Roger Hsiao,et al.  Discriminative feature transformation by guided discriminative training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[18]  Hervé Bourlard,et al.  Hybrid HMM/ANN and GMM combination for user-customized password speaker verification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..