Multi-modal user interaction method based on gaze tracking and gesture recognition

This paper presents a gaze tracking technology which provides a convenient human-centric interface for multimedia consumption without any wearable device. It enables a user to interact with various multimedia on a large display in distance by tracking user movement and acquiring high resolution eye images. This paper also presents a gesture recognition technology which is helpful to interact with scene descriptions in terms of controlling and rendering scene objects. It is based on Hidden Markov Model and CRF using a commercial depth sensor. And then, this paper shows a collaboration method with those new sensors and MPEG standards in order to achieve interoperability among interactive applications, new user interaction devices and users.

[1]  John Daugman,et al.  New Methods in Iris Recognition , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Francisco José Madrid-Cuevas,et al.  Depth silhouettes for gesture recognition , 2008, Pattern Recognit. Lett..

[3]  G. Medioni,et al.  Human pose estimation from a single view point , 2009 .

[4]  Luc Van Gool,et al.  Real-time 3D hand gesture interaction with a robot for understanding directions from humans , 2011, 2011 RO-MAN.

[5]  Rainer Stiefelhagen,et al.  Real-Time Person Tracking and Pointing Gesture Recognition for Human-Robot Interaction , 2004, ECCV Workshop on HCI.

[6]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[7]  Gérard G. Medioni,et al.  Human pose estimation from a single view point, real-time range sensor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[8]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[9]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[10]  Seong-Whan Lee,et al.  Gesture Spotting and Recognition for Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[11]  Robyn A. Owens,et al.  Australian sign language recognition , 2005, Machine Vision and Applications.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[14]  Anbumani Subramanian,et al.  Dynamic Hand Pose Recognition Using Depth Data , 2010, 2010 20th International Conference on Pattern Recognition.

[15]  Jinwoong Kim,et al.  Face and eye tracking for sub-hologram-based digital holographic display system , 2012, Defense, Security, and Sensing.

[16]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[19]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[20]  Peter D. Lawrence,et al.  Improving the Accuracy and Reliability of Remote System-Calibration-Free Eye-Gaze Tracking , 2009, IEEE Transactions on Biomedical Engineering.