Language model representations for beam-search decoding

This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and of a tree organization of all the words that can follow a given one. Moreover, an optimization algorithm is used to considerably reduce the space requirements of the language model. Experimental results are provided for two 10,000-word dictation tasks: radiological reporting (perplexity 27) and newspaper dictation (perplexity 120). In the former domain 93% word accuracy is achieved with real-time response and 23 Mb process space. In the newspaper dictation domain, 88.1% word accuracy is achieved with 1.41 real-time response and 38 Mb process space. All recognition tests were performed on an HP-735 workstation.