Multimodal dialogue in mobile local search

Speak4itSM is a multimodal, mobile search application that provides information about local businesses. Users can combine speech and touch input simultaneously to make search queries or commands to the application. For example, a user might say, "gas stations", while simultaneously tracing a route on a touchscreen. In this demonstration, we describe the extension of our multimodal semantic processing architecture and application from a one-shot query system to a multimodal dialogue system that tracks dialogue state over multiple turns. We illustrate the capabilities and limitations of an information-state-based approach to multimodal interpretation. We provide interactive demonstrations of Speak4it on a tablet and a smartphone, and explain the challenges of supporting true multimodal interaction in a deployed mobile service.