An exploration of gesture-speech multimodal patterns for touch interfaces

Multimodal interfaces that integrate multiple input modalities such as speech, gestures, gaze, and so on have shown considerable promise in terms of higher task efficiency, lower error rates and higher user satisfaction. However, the adoption of such interfaces for real-world systems has proved to be slow, and the reasons may be both technological (e.g. accuracy of recognition engines, fusion engines, authoring) as well as usability-related. In this paper, we explore a few patterns of "command and control" style multimodal interaction (MMI) using touch gestures and short speech utterances. We then describe a multimodal interface for a photo browsing application and a user study to understand some of the usability issues with such interfaces. Specifically, we study walk-up use of multimodal commands for photo manipulations, and compare this with unimodal multi-touch interactions. We observe that there is a learning period after which the user gets more comfortable with the multimodal commands, and the average task completions times reduce significantly. We also analyze temporal integration patterns of speech and touch gestures. We see this as the first of many studies leading to more detailed understanding of user preferences and performance for using MMI, which can help inform the judicious use of MMI in designing interactions for future interfaces.