A second-order HMM for high performance word and phoneme-based continuous speech recognition

In the field of speech recognition by stochastic methods, it is conventional to pursue approaches using first-order-hidden Markov models (HMM1s). Despite the success of this approach, it is still worth investigating if some of the drawbacks of HMM1s can be overcome, e.g. by using higher-order Markov processes. In this paper, we show that second-order hidden Markov models (HMM2s) can yield high performances in the context of continuous speech recognition. We first present the underlying equations and complexity of HMM2s in the maximum likelihood estimation (MLE) paradigm. Then, we show that in a connected word recognition task, such as spelled name recognition over the telephone, HMM2s outperform HMM1s. In the field of phoneme-based continuous speech recognition, we show that context-independent HMM2s can achieve more than 69% phone accuracy.