Speaking rate compensation based on likelihood criterion in acoustic model training and decoding

In this paper, we propose a speaking rate compensation method using frame period and frame length adaptation. Our method decodes an input utterance using several sets of frame period and frame length parameters for speech analysis. Then, this method selects the best set with the highest score which consists of the acoustic likelihood normalized by frame period, language likelihood and insertion penalty. Furthermore, we apply this approach to the training of the acoustic model. We calculate the acoustic likelihood for each frame period and frame length using Viterbi alignment and select the best one for each training utterance. The proposed speaking rate compensation applied to both the acoustic model creation process and decoding process resulted in accuracy improvement of 2.9% (absolute) for spontaneous lecture speech recognition task.