An Exploration of Eye Gaze in Spoken Language Processing for Multimodal Conversational Interfaces

Motivated by psycholinguistic findings, we are currently investigating the role of eye gaze in spoken language understanding for multimodal conversational systems. Our assumption is that, during human machine conversation, a user’s eye gaze on the graphical display indicates salient entities on which the user’s attention is focused. The specific domain information about the salient entities is likely to be the content of communication and therefore can be used to constrain speech hypotheses and help language understanding. Based on this assumption, this paper describes an exploratory study that incorporates eye gaze in salience modeling for spoken language processing. Our empirical results show that eye gaze has a potential in improving automated language processing. Eye gaze is subconscious and involuntary during human machine conversation. Our work motivates more in-depth investigation on eye gaze in attention prediction and its implication in automated language processing.

[1]  Carla Huls,et al.  Automatic Referent Resolution of Deictic and Anaphoric Expressions , 1995, CL.

[2]  Deb Roy,et al.  Towards situated speech understanding: visual context priming of language models , 2005, Comput. Speech Lang..

[3]  Robert J. K. Jacob,et al.  Eye tracking in advanced interface design , 1995 .

[4]  Roel Vertegaal,et al.  The GAZE groupware system: mediating joint attention in multiparty communication and collaboration , 1999, CHI '99.

[5]  Joyce Yue Chai,et al.  A Salience Driven Approach to Robust Input Interpretation in Multimodal Conversational Systems , 2005, HLT.

[6]  Manpreet Kaur,et al.  Where is "it"? Event Synchronization in Gaze-Speech Input Systems , 2003, ICMI '03.

[7]  Joyce Yue Chai,et al.  Salience modeling based on non-verbal modalities for spoken language understanding , 2006, ICMI '06.

[8]  Robert J. K. Jacob,et al.  What you look at is what you get: eye movement-based interaction techniques , 1990, CHI '90.

[9]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[10]  Shumin Zhai,et al.  Conversing with the user based on eye-gaze patterns , 2005, CHI.

[11]  Beth Ann Hockey,et al.  Using eye movements to determine referents in a spoken dialogue system , 2001, PUI '01.

[12]  Andrew T Duchowski,et al.  A breadth-first survey of eye-tracking applications , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[13]  Jacob Eisenstein,et al.  A Salience-Based Approach to Gesture-Speech Alignment , 2004, HLT-NAACL.

[14]  Roger K. Moore Computer Speech and Language , 1986 .

[15]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[16]  M. Just,et al.  Eye fixations and cognitive processes , 1976, Cognitive Psychology.

[17]  Zenzi M. Griffin,et al.  PSYCHOLOGICAL SCIENCE Research Article WHAT THE EYES SAY ABOUT SPEAKING , 2022 .

[18]  B. Velichkovsky Communicating attention: Gaze position transfer in cooperative problem solving , 1995 .

[19]  Richard A. Bolt,et al.  A gaze-responsive self-disclosing display , 1990, CHI '90.

[20]  Andrew Kehler,et al.  Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction , 2000, AAAI/IAAI.

[21]  Zenzi M. Griffin,et al.  Why Look? Reasons for Eye Movements Related to Language Production. , 2004 .

[22]  Shumin Zhai,et al.  Manual and gaze input cascaded (MAGIC) pointing , 1999, CHI '99.