An efficient unification-based multimodal language processor in multimodal input fusion

A Multimodal User Interface (MMUI) allows a user to interact with a computer in a way similar to human-to-human communication, for example, through speech and gesture. Being an essential component in MMUIs, Multimodal Input Fusion should be able to find the semantic interpretation of a user's intention from recognized multimodal symbols which are semantically complementary. We enhanced our efficient unification-based multimodal parsing processor, which has the potential to achieve low polynomial computational complexity while parsing versatile multimodal inputs within a speech and gesture based MMUI, to handle multimodal inputs from more than two modalities. Its ability to disambiguate speech recognition results with gesture recognition results was verified with an experiment. The analysis of experiment results demonstrates that the improvement is significant after applying this technique.

[1]  G. David Garson,et al.  IT Solutions Series: Humanizing Information Technology: Advice from Experts , 2004 .

[2]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[3]  Michael Johnston,et al.  Integrating multimodal language processing with speech recognition , 2000, INTERSPEECH.

[4]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[5]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[6]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[7]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[8]  Marc Erich Latoschik A user interface framework for multimodal VR interactions , 2005, ICMI '05.

[9]  Panayiotis Zaphiris,et al.  Human computer interaction : concepts, methodologies, tools, and applications , 2009 .

[10]  Marie Jefsioutine,et al.  Design Methods for Experience Design , 2007 .

[11]  J. Gabriel Amores,et al.  Multimodal fusion: a new hybrid strategy for dialogue systems , 2006, ICMI '06.

[12]  Sanjeev Kumar,et al.  A multimodal learning interface for sketch, speak and point creation of a schedule chart , 2004, ICMI '04.

[13]  Yong Sun,et al.  An Efficient Multimodal Language Processor for Parallel Input Strings in Multimodal Input Fusion , 2007 .

[14]  Michael Johnston,et al.  Unification-based Multimodal Parsing , 1998, ACL.

[15]  Fang Chen,et al.  An Efficient Multimodal Language Processor for Parallel Input Strings in Multimodal Input Fusion , 2007, International Conference on Semantic Computing (ICSC 2007).

[16]  David J. Weir,et al.  Polynomial Time Parsing of Combinatory Categorial Grammars , 1990, ACL.

[17]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[18]  T. Landauer,et al.  Handbook of Human-Computer Interaction , 1997 .

[19]  Frank Rudzicz Clavius: Bi-Directional Parsing for Generic Multimodal Interaction , 2006, ACL.

[20]  Dennis E. Egan,et al.  Handbook of Human Computer Interaction , 1988 .