Speech recognition with a generative factor analyzed hidden Markov model

We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a traditional HMM, theobservation vectors arerepresented by mixture ofGaussians (MoG) that are dependent on discrete-valued hidden state sequence. The GFA-HMM introduces a hierarchy of continuousvalued latent representation of observation vectors, where latent vectors in one level are acoustic-unit dependent and the latent vectors in a higher level are acoustic-unit independent. An expectation maximization (EM) algorithm is derived for maximum likelihood parameter estimation of the model. The GFA-HMM can achieve a much more compact representation of the intra-frame statistics of observation vectors than traditional HMM. We conducted an experiment to show that the GFA-HMM can achieve better performances over traditional HMM with the same amount of training data but much smaller number of model parameters.