Hermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models

This work is concerned with speaker adaptation techniques for artificial neural network (ANN) implemented as feed-forward multi-layer perceptrons (MLPs) in the context of large vocabulary continuous speech recognition (LVCSR). Most successful speaker adaptation techniques for MLPs consist of augmenting the neural architecture with a linear transformation network connected to either the input or the output layer. The weights of this additional linear layer are learned during the adaptation phase while all of the other weights are kept frozen in order to avoid over-fitting. In doing so, the structure of the speakerdependent (SD) and speaker-independent (SI) architecture differs and the number of adaptation parameters depends upon the dimension of either the input or output layers. We propose an alternative neural architecture for speaker-adaptation to overcome the limits of current approaches. This neural architecture adopts hidden activation functions that can be learned directly from the adaptation data. This adaptive capability of the hidden activation function is achieved through the use of orthonormal Hermite polynomials. Experimental evidence gathered on the Wall Street Journal Nov92 task demonstrates the viability of the proposed technique.

[1]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[2]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[4]  Dong Yu,et al.  Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[5]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[6]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[8]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[9]  Giovanni Pilato,et al.  Using the Hermite Regression Formula to Design a Neural Architecture with Automatic Learning of the "Hidden" Activation Functions , 1999, AI*IA.

[10]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[11]  Ciro Martins,et al.  Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system , 1995, EUROSPEECH.

[12]  Pietro Laface,et al.  Linear hidden transformations for adaptation of hybrid ANN/HMM models , 2007, Speech Commun..

[13]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[14]  Chin-Hui Lee,et al.  Towards bottom-up continuous phone recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[15]  Khashayar Khorasani,et al.  Constructive feedforward neural networks using Hermite polynomial activation functions , 2005, IEEE Transactions on Neural Networks.

[16]  Chin-Hui Lee,et al.  Experimental studies on continuous speech recognition using neural architectures with “adaptive” hidden activation functions , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Hynek Hermansky,et al.  Exploiting contextual information for improved phoneme recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.