论文信息 - Combining body pose, gaze, and gesture to determine intention to interact in vision-based interfaces

Combining body pose, gaze, and gesture to determine intention to interact in vision-based interfaces

Vision-based interfaces, such as those made popular by the Microsoft Kinect, suffer from the Midas Touch problem: every user motion can be interpreted as an interaction. In response, we developed an algorithm that combines facial features, body pose and motion to approximate a user's intention to interact with the system. We show how this can be used to determine when to pay attention to a user's actions and when to ignore them. To demonstrate the value of our approach, we present results from a 30-person lab study conducted to compare four engagement algorithms in single and multi-user scenarios. We found that combining intention to interact with a 'raise an open hand in front of you' gesture yielded the best results. The latter approach offers a 12% improvement in accuracy and a 20% reduction in time to engage over a baseline 'wave to engage' gesture currently used on the Xbox 360.

Scott E. Hudson | Tommer Leyvand | Jennifer Mankoff | Julia Schwarz | Charles Claudius Marais

[1] A. F. Adams,et al. The Survey , 2021, Dyslexia in Higher Education.

[2] Andrea Kleinsmith,et al. Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[3] Jörg Müller,et al. StrikeAPose: revealing mid-air gestures on public displays , 2013, CHI.

[4] Kostas Karpouzis,et al. Feature Extraction and Selection for Inferring User Engagement in an HCI Environment , 2009, HCI.

[5] Björn Hartmann,et al. Pictionaire: supporting collaborative design work by integrating physical and digital artifacts , 2010, CSCW '10.

[6] Nadia Bianchi-Berthouze,et al. What Can Body Movement Tell Us About Players' Engagement? , 2012 .

[7] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[8] Rosalind W. Picard,et al. Automated Posture Analysis for Detecting Learner's Interest Level , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[9] Joyojeet Pal,et al. Multiple mice for retention tasks in disadvantaged schools , 2007, CHI.

[10] Candace L. Sidner,et al. Recognizing engagement in human-robot interaction , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12] Mubarak Shah,et al. Determining driver visual attention with one camera , 2003, IEEE Trans. Intell. Transp. Syst..

[13] Marek P. Michalowski,et al. A spatial model of engagement for a social robot , 2006, 9th IEEE International Workshop on Advanced Motion Control, 2006..

[14] Yuri Ivanov,et al. Probabilistic combination of multiple modalities to detect interest , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[15] John Markus Bjørndalen,et al. Gesture-Based, Touch-Free Multi-User Gaming on Wall-Sized, High-Resolution Tiled Displays , 2008, J. Virtual Real. Broadcast..

[16] Ana Paiva,et al. Automatic analysis of affective postures and body motion to detect engagement with a game companion , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[17] Scott E. Hudson,et al. A framework for robust and flexible handling of inputs with uncertainty , 2010, UIST.

[18] Rick Kjeldsen,et al. Design issues for vision-based computer interaction systems , 2001, PUI '01.

[19] Ken Hinckley,et al. A survey of design issues in spatial input , 1994, UIST '94.

[20] Eric Horvitz,et al. Dialog in the open world: platform and applications , 2009, ICMI-MLMI '09.

[21] Nadia Bianchi-Berthouze,et al. Understanding the Role of Body Movement in Player Engagement , 2012, Hum. Comput. Interact..

[22] Yukiko I. Nakano,et al. Estimating user's engagement from eye-gaze behaviors in human-agent conversations , 2010, IUI '10.

[23] Eric Horvitz,et al. Learning to Predict Engagement with a Spoken Dialog System in Open-World Settings , 2009, SIGDIAL Conference.