The Application of Bayesian Information Criterion in Acoustic Model Refinement

Automatic speech recognition (ASR) systems usually consist of an acoustic model and a language model. This paper describes a technique of an efficient deployment of the acoustic model parameters. The acoustic model typically utilizes Continuous Density Hidden Markov Models (CDHMM). The output probability of a particular CDHMM state is represented by a Gaussian mixture density with a diagonal covariance structure. Usually, the output probability density function of each CDHMM state contains the same number of mixture components although a different number of components in individual states may yield more accurate recognition results, especially for lowresource ASR systems. The central idea is to assign more components to states where it is effective and less components to states where the increasing number of components is not warranting a significantly better description of the training data. The number of mixture components for a particular CDHMM state is chosen by optimizing the Bayesian Information Criterion (BIC).