Clarification Question Generation for Speech Recognition Error Recovery Using Monolingual SMT

Clarification dialogue is an efficient and direct way of handling speech recognition errors in speech interface applica- tions. In this paper we present a new approach to Clarification Question (CQ) generation. Monolingual phrase-based SMT (PB- SMT) framework is introduced to generate robust and flexible CQs. A parallel corpus from simulated error to manually anno- tated CQ is established and used for training the model. A new type of generalized phrase pair is expanded from conventional translation phrase table. Combining both generalized and con- ventional phrase pairs, a two-step decoding process is carried out to generate CQs. Both manually and automatic metrics are used to evaluate the quality of generated CQs. Experimental results show that our method can effectively generate reasonable CQs form miss-recognized utterances, and generated CQs can be used to prompt a clarification dialogue for error handling.