Using semantic class information for rapid development of language models within ASR dialogue systems

When dialogue system developers tackle a new domain, much effort is required; the development of different parts of the system usually proceeds independently. Yet it may be profitable to coordinate development efforts between different modules. We focus our efforts on extending small amounts of language model training data by integrating semantic classes that were created for a natural language understanding module. By converting finite state parses of a training corpus into a probabilistic context free grammar and subsequently generating artificial data from the context free grammar, we can significantly reduce perplexity and automatic speech recognition (ASR) word error for situations with little training data. Experiments are presented using data from the ATIS and DARPA Communicator travel corpora.