A Vision-Based Remote Control

This Chapter presents a vision-based system for touch-free interaction with a display at a distance. A single camera is fixed on top of the screen and is pointing towards the user. An attention mechanism allows the user to start the interaction and control a screen pointer by moving their hand in a fist pose directed at the camera. On-screen items can be chosen by a selection mechanism. Current sample applications include browsing video collections as well as viewing a gallery of 3D objects, which the user can rotate with their hand motion. We have included an up-to-date review of hand tracking methods, and comment on the merits and shortcomings of previous approaches. The proposed tracker uses multiple cues, appearance, color, and motion, for robustness. As the space of possible observation models is generally too large for exhaustive online search, we select models that are suitable for the particular tracking task at hand. During a training stage, various off-the-shelf trackers are evaluated. From this data differentmethods of fusing them online are investigated, including parallel and cascaded tracker evaluation. For the case of fist tracking, combining a small number of observers in a cascade results in an efficient algorithm that is used in our gesture interface. The system has been on public display at conferences where over a hundred users have engaged with it.

[1]  Jochen Triesch,et al.  A System for Person-Independent Hand Posture Recognition against Complex Backgrounds , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Shai Avidan,et al.  Support vector tracking , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Antonio Criminisi,et al.  C-Slate: Exploring Remote Collaboration on Horizontal Multi-touch Surfaces , 2007 .

[4]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[5]  Bernd Neumann,et al.  Computer Vision — ECCV’98 , 1998, Lecture Notes in Computer Science.

[6]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[7]  Yuan Li,et al.  Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Lifespans , 2007, CVPR.

[8]  Ivan Poupyrev,et al.  The MagicBook - Moving Seamlessly between Reality and Virtuality , 2001, IEEE Computer Graphics and Applications.

[9]  Manolis I. A. Lourakis,et al.  Vision-Based Interpretation of Hand Gestures for Remote Control of a Computer Mouse , 2006, ECCV Workshop on HCI.

[10]  Shree K. Nayar,et al.  Computer Vision - ACCV 2006, 7th Asian Conference on Computer Vision, Hyderabad, India, January 13-16, 2006, Proceedings, Part I , 2006, ACCV.

[11]  Antonio Camurri,et al.  Gesture-Based Communication in Human-Computer Interaction , 2003, Lecture Notes in Computer Science.

[12]  Max Van Kleek,et al.  Virtual mouse vision based interface , 2004, IUI '04.

[13]  Nicu Sebe,et al.  Computer Vision in Human-Computer Interaction , 2004, Lecture Notes in Computer Science.

[14]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[15]  G. Sandini,et al.  Computer Vision — ECCV'92 , 1992, Lecture Notes in Computer Science.

[16]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[17]  Andrew W. Fitzgibbon,et al.  Real-time gesture recognition using deterministic boosting , 2002, BMVC.

[18]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Björn Stenger,et al.  AIDIA - Adaptive Interface for Display InterAction , 2008, BMVC.

[20]  Geoffrey E. Hinton,et al.  Learning Generative Texture Models with extended Fields-of-Experts , 2009, BMVC.

[21]  Rajeev Sharma,et al.  Multimodal human-computer interaction for crisis management systems , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[22]  Francesc Moreno-Noguer,et al.  Dependent Multiple Cue Integration for Robust Tracking , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Naonori Ueda,et al.  Tracking Moving Contours Using Energy-Minimizing Elastic Contour Models , 1992, ECCV.

[24]  Andrew Blake,et al.  Sparse Bayesian learning for efficient visual tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  William T. Freeman,et al.  Television control by hand gestures , 1994 .

[26]  James M. Rehg Visual analysis of high DOF articulated objects with application to hand tracking , 1995 .

[27]  A. G. Amitha Perera,et al.  A unified framework for tracking through occlusions and across sensor gaps , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Ying Wu,et al.  Vision-Based Gesture Recognition: A Review , 1999, Gesture Workshop.

[29]  Manolis I. A. Lourakis,et al.  Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera , 2004, ECCV.

[30]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[31]  Olivier D. Faugeras,et al.  Finding pose of hand in video images: a stereo-based approach , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[32]  Björn Stenger,et al.  Learning to track with multiple observers , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[34]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Yoichi Sato,et al.  Real-Time Fingertip Tracking and Gesture Recognition , 2002, IEEE Computer Graphics and Applications.

[36]  Takeshi Mita,et al.  Discriminative Feature Co-Occurrence Selection for Object Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Björn Stenger,et al.  A Real-Time Hand Gesture Interface Implemented on a Multi-Core Processor , 2007, MVA.

[38]  David W. Murray,et al.  Regression-based Hand Pose Estimation from Multiple Cameras , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[40]  Thomas S. Huang,et al.  Tracking articulated hand motion with eigen dynamics analysis , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  George Kollios,et al.  BoostMap: A method for efficient approximate similarity rankings , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[42]  Rogério Schmidt Feris,et al.  Multi-view Appearance-based 3D Hand Pose Estimation , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[43]  Ramin Zabih,et al.  An Algorithm for Real-Time Tracking of Non-Rigid Objects , 1991, AAAI.

[44]  Roberto Cipolla,et al.  Uncalibrated Stereo Vision with Pointing for a Man-Machine Interface , 1994, MVA.

[45]  David J. Fleet,et al.  Model-based hand tracking with texture, shading and self-occlusions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Daniel P. Huttenlocher,et al.  Tracking non-rigid objects in complex scenes , 1993, 1993 (4th) International Conference on Computer Vision.

[47]  Martin Tosas Visual articulated hand tracking for interactive surfaces , 2006 .

[48]  Mathias Kölsch,et al.  Fast 2D Hand Tracking with Flocks of Features and Multi-Cue Integration , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[49]  E. Sheader,et al.  The Nintendo Wii , 2010 .

[50]  S. B. Prakash,et al.  of Electrical and Computer Engineering , 1984 .

[51]  Lars Bretzner,et al.  Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[52]  Yuan Li,et al.  Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Lifespans , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[54]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Yoshiaki Shirai,et al.  Real-time 3D hand posture estimation based on 2D appearance retrieval using monocular camera , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[56]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[57]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[58]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Richard Bowden,et al.  Large Lexicon Detection of Sign Language , 2007, ICCV-HCI.

[60]  Jovan Popovic,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH '09.

[61]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[62]  Björn Stenger,et al.  Template-Based Hand Pose Recognition Using Multiple Cues , 2006, ACCV.

[63]  Yanxi Liu,et al.  Online selection of discriminative tracking features , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Ehud Rivlin,et al.  A General Framework for Combining Visual Trackers – The "Black Boxes" Approach , 2006, International Journal of Computer Vision.

[65]  Ying Wu,et al.  View-independent recognition of hand postures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[66]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[67]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[68]  Surendra Ranganath,et al.  Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Roberto Cipolla,et al.  Human-robot interface by pointing with uncalibrated stereo vision , 1996, Image Vis. Comput..

[70]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.

[71]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[72]  Andrew Zisserman,et al.  Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts , 2008, BMVC.

[73]  David C. Gibbon,et al.  Multi-modal system for locating heads and faces , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[74]  Patrick Pérez,et al.  Data fusion for visual tracking with particles , 2004, Proceedings of the IEEE.

[75]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[76]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Patrick Pérez,et al.  Probabilistic Color and Adaptive Multi-Feature Tracking with Dynamically Switched Priority Between Cues , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[78]  Justus H. Piater,et al.  A Probabilistic Approach to Integrating Multiple Cues in Visual Tracking , 2008, ECCV.

[79]  Luc Van Gool,et al.  Tracking a hand manipulating an object , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[80]  Gregory D. Hager,et al.  Real-time tracking of image regions with changes in geometry and illumination , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[81]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[82]  Pierre David Wellner,et al.  Interacting with paper on the DigitalDesk , 1993, CACM.

[83]  Richard Bowden,et al.  A boosted classifier tree for hand shape detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[84]  Vladimir Pavlovic,et al.  Special issue on vision for human-computer interaction , 2007, Comput. Vis. Image Underst..

[85]  Michael Isard,et al.  A mixed-state condensation tracker with automatic model-switching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[86]  Stanley T. Birchfield,et al.  Elliptical head tracking using intensity gradients and color histograms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[87]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 2004, International Journal of Computer Vision.

[88]  Roberto Cipolla,et al.  Computer Vision — ECCV '96 , 1996, Lecture Notes in Computer Science.

[89]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[91]  Robert T. Collins,et al.  An Open Source Tracking Testbed and Evaluation Web Site , 2005 .

[92]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[93]  Roger J. Hubbold,et al.  A real-time hand tracker using variable-length Markov models of behaviour , 2007, Comput. Vis. Image Underst..

[94]  KoikeHideki,et al.  Real-Time Fingertip Tracking and Gesture Recognition , 2002 .