Language Modeling for Spoken Dialogue System Based on Sentence Filtering Using Predicate-Argument Structures

A novel text selection approach for training a language model (LM) with Web texts is proposed for automatic speech recognition (ASR) of spoken dialogue systems. Compared to the conventional approach based on perplexity criterion, the proposed approach introduces a semantic-level relevance measure with the back-end knowledge base used in the dialogue system. We focus on the predicate-argument (P-A) structure characteristic to the domain in order to filter semantically relevant sentences in the domain. Moreover, combination with the perplexity measure is investigated. Experimental evaluations in two different domains demonstrate the effectiveness and generality of the proposed approach. The combination method realizes significant improvement not only in ASR accuracy but also in semantic-level accuracy.