Gestures or speech? Comparing modality selection for different interaction tasks in a virtual environment

In this paper, we investigate whether users prefer speech or gesture input for four distinct interaction tasks commonly found in virtual environments: navigation, selection, dialogue, and object manipulation. For this purpose, we implemented an interactive storytelling scenario in which the users could always choose between gesture and speech commands for each interaction. Both input modalities were processed in real-time using a low-cost depth sensor and microphone. We conducted a study in order to identify the modality preferences for each task. We got strong results for the navigational task, for which gestural interaction seemed to be more suitable, and for the dialogue task which was in favour of speech. For the object manipulation and selection tasks we did not observe a clear preference for one of the modalities, but we found indications for why some participants chose speech and others preferred gestures by analysing the participants’ ratings of their experience with the interaction.