On optimal order in modeling sequence of letters in words of common language as a Markov chain

Abstract In recognition of words of a language such as English, the letter sequences of the words are often modeled as Markov chains. In this paper the problem of determining the optimal order of such Markov chains is addressed using Tong's minimum Akaike information criterion estimate (MAICE) approach and Hoel's likelihood ratio statistic based hypothesis-testing approach. Simulation results show that the sequence of letters in English words is more likely to be a second order Markov chain than a first order one.