Improving Response Time through Multimodal Integration Pattern Modeling

While researchers have focused primarily on accuracy when addressing multimodal input segmentation, response time (or latency) has been rather overlooked in their work, despite its unquestionable importance. We propose a method of the input segmentation through integration pattern modeling that provides a significant improvement in response time over the state-of-the-art approaches, while maintaining remarkably high accuracy (98–99%). To this end, a new Bayesian Belief Network classification model was designed based on the recent empirical evidence about users’ multimodal integration patterns. The model is employed in a procedure to segment related inputs into multimodal units. Using the introduced procedure the response time can be improved to 0.8 seconds for sequential integrators and even dropped bellow 0.5 s for simultaneous, which represents a relative improvement of 20% and 50%, resp., at the very least. Although demonstrated on a combination of speech and gestures, the suggested approach can be generalized to a broad range of other modality mixtures.

[1]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[2]  Sharon L. Oviatt,et al.  Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of individual differences , 2003, ICMI '03.

[3]  Sharon Oviatt,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997 .

[4]  Sharon L. Oviatt,et al.  Individual differences in multimodal integration patterns: what are they and why do they exist? , 2005, CHI.

[5]  Kazuya Takeda,et al.  Improvement of multimodal gesture and speech recognition performance using time intervals between gestures and accompanying speech , 2014, EURASIP Journal on Audio, Speech, and Music Processing.

[6]  Ellen C. Haas,et al.  Temporal binding of multimodal controls for dynamic map displays: a systems approach , 2011, ICMI '11.

[7]  Paulo Barthelmess,et al.  Edge-splitting in a cumulative multimodal system, for a no-wait temporal threshold on information fusion, combined with an under-specified display , 2006, INTERSPEECH.

[8]  Sharon L. Oviatt,et al.  Toward a theory of organized multimodal integration patterns during human-computer interaction , 2003, ICMI '03.

[9]  Anurag Kumar Gupta,et al.  Dynamic time windows for multimodal input fusion , 2004, INTERSPEECH.

[10]  Beat Signer,et al.  Fusion in multimodal interactive systems: an HMM-based algorithm for user-induced adaptation , 2012, EICS '12.

[11]  Sharon L. Oviatt,et al.  Combining User Modeling and Machine Learning to Predict Users' Multimodal Integration Patterns , 2006, MLMI.

[12]  Woontack Woo,et al.  A usability study of multimodal input in an augmented reality environment , 2013, Virtual Reality.

[13]  Frank Honold,et al.  Multimodal Interaction History and its use in Error Detection and Recovery , 2014, ICMI.

[14]  Roman Hak,et al.  Consistent categorization of multimodal integration patterns during human–computer interaction , 2017, Journal on Multimodal User Interfaces.