Accuracy versus complexity in context dependent phone modeling

This paper presents two di erent directions to build HMM models which give enough acoustic resolution and t in limited user resources. They both refer to scaling down the acoustic models which are built with tied gaussian HMMs. The total number of gaussians is reduced by a pairwise merging, and the number of gaussians per state is reduced by selecting them based on the so called occupancy criterion. Experiments carried out on the WSJ recognition task show that after scaling down, no further training is needed when the number of gaussians or the number of gaussians per state is reduced up to a factor three. This is an advantage as retraining can not be executed by the nal system user.