Efficient Grammar Generation and Tuning for Interactive Voice Response Applications

This paper presents a procedure to efficiently create and tune context free grammars for directed dialog speech applications using only spoken test user utterances. We present a procedure to transcribe utterances with improved accuracy by post-processing the ASR n-best lists with higher level knowledge sources and additional information from the application prompt. We then present a semantic categorizer for the transcriptions, a statistical filtering mechanism for modifying the grammars and, a mechanism to raise an alarm condition in case of large in-flow of errors. We also illustrate the importance of additional improvements gained by using the semantic classification strength in a feedback loop to the transcription mechanism