Language model acquisition from a text corpus for speech understanding

Speech understanding can be viewed as a problem of translating the input natural language of speech recognition results into an output semantic language. This paper describes automatic acquisition of a language model for translating natural language into semantic language from a text corpus using a stochastic method. The method estimates the co-occurrence probabilities of input and output grammar rules as a translation language model. Since the amount of text is limited, estimating a reliable language model is difficult. Therefore, we propose a method of concisely modeling input and output grammars in order to estimate a reliable translation model. Our method is shown to be effective by experiments using the ARPA ATIS task.