An Efficient Multimodal Language Processor for Parallel Input Strings in Multimodal Input Fusion

Multimodal user interaction technology aims at building more natural and intuitive interfaces allowing a user to interact with a computer in a way similar to human-to-human communication, for example, through speech and gesture. As a critical component in multimodal user interaction, multimodal input fusion explores the ways to effectively interpret the combined semantic interpretation of user inputs through multiple modalities. This paper proposes a new efficient unification-based multimodal language processor which can handle parallel input strings for multimodal input fusion. With a structure sharing technology, it has the potential to achieve a low polynomial computational complexity while parsing multimodal inputs in versatile styles. The applicability of the proposed processor has been validated through an experiment with multimodal commands collected from traffic incident management scenarios. The description of the proposed multimodal language processor and preliminary experiment results are presented.

[1]  Frank Rudzicz Clavius: Bi-Directional Parsing for Generic Multimodal Interaction , 2006, ACL.

[2]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[3]  Chris Mellish,et al.  Natural Language Processing in PROLOG , 1989 .

[4]  Sanjeev Kumar,et al.  A multimodal learning interface for sketch, speak and point creation of a schedule chart , 2004, ICMI '04.

[5]  Nobo Komagata,et al.  Efficient Parsing for CCGs with Generalized Type-raised Categories , 1997, IWPT.

[6]  Fang Chen,et al.  A study of manual gesture-based selection for the PEMMI multimodal transport management interface , 2005, ICMI '05.

[7]  David J. Weir,et al.  Polynomial Time Parsing of Combinatory Categorial Grammars , 1990, ACL.

[8]  Michael Johnston,et al.  Finite-state multimodal integration and understanding , 2005, Natural Language Engineering.

[9]  G. J. M. Kruij A Categorial-Modal Architecture of Informativity , 2001 .

[10]  Fang Chen,et al.  Examining the redundancy of multimodal input , 2006, OZCHI.

[11]  Michael Johnston,et al.  Unification-based Multimodal Parsing , 1998, ACL.

[12]  Michael Johnston,et al.  Finite-state Multimodal Parsing and Understanding , 2000, COLING.

[13]  Fang Chen,et al.  A novel method for multi-sensory data fusion in multimodal human computer interaction , 2006, OZCHI.

[14]  Gavriel Salvendy,et al.  Handbook of Human-Computer Interaction (Book Review) , 1999, International journal of human computer interactions.

[15]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[16]  Marc Erich Latoschik A user interface framework for multimodal VR interactions , 2005, ICMI '05.

[17]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[18]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.