Active Vision Laboratory 3 D Hand and Object Tracking for Intention Recognition

The actions and intended actions of humans performing a wide range of tasks appear for the most part effortlessly understood by others looking on. Automatic understanding of human intentions is, of course, much less advanced, but it is certainly the key to enhancing human interactions with computers and computer-controlled machinery. Visual sensing has a highly significant advantage over other sensing modalities in this rôle, providing a passive and non-intrusive way of acquiring information about actions and ‘body language’, which can then be used to predict movements and intention. The focus of this project is the recovery of the position, motions and grasp type of the human hand and forearm. Unlike computer sign language recognition, this will be done in the context of surrounding objects in the manipulation space, and places importance on representing hands and objects in a three dimensional coordinate frame. It is a task that hints that maintaining an explicit 3D articulated model of the hand may be useful. This report on the work that I have done so far first gives an overview of the prior art in this area, and makes critical comments about the literature. A requisite to locate hands and objects in images is object segmentation. We have evaluated the distribution of skin colour samples and noted that a method for pixel classification based on colour can be effective. We then describe a histogram-based method and show some experiments on skin colour classification for hand segmentation. A method for tracking rigid objects in 3D using 3 calibrated views was implemented. This method is based on the RAPiD system. It was evaluated using synthetic and real images using parameters that allows real time implementations. A framework for tracking a specific pose of the hand from the view of a single small wearable camera is described. This system is based on the use of the skin detection method described earlier, a shape detector and the RAPiD tracker. Preliminary conclusions are draw in this report and the study developed was used to elaborate a timetabled plan for future work. The author is in receipt of a doctoral scholarship provided by CAPES Foundation Brazil (BEX 1550/00-4).

[1]  Marc Jeannerod,et al.  Attention and Performance XIII , 2018 .

[2]  D. Pélisson,et al.  Gaze Saccade Orienting and Hand Pointing are Locked to Their Goal by Quick Internal Loops , 2018, Attention and Performance XIII.

[3]  Edward H. Adelson,et al.  Recovering intrinsic images from a single image , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[5]  José Miguel Buenaposada,et al.  Real-time tracking and estimation of plane pose , 2002, Object recognition supported by user interaction for service robots.

[6]  Amnon Shashua,et al.  Manifold pursuit: a new approach to appearance based recognition , 2002, Object recognition supported by user interaction for service robots.

[7]  David W. Murray,et al.  Designing a miniature wearable visual robot , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[8]  Mansoor Sarhadi,et al.  A non-linear model of shape and motion for tracking finger spelt American sign language , 2002, Image Vis. Comput..

[9]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Kostas Daniilidis,et al.  Linear Pose Estimation from Points or Lines , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Cristian Sminchisescu,et al.  Building Roadmaps of Local Minima of Visual Models , 2002, ECCV.

[12]  Vladimir Kolmogorov,et al.  Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[13]  Cristian Sminchisescu,et al.  Hyperdynamics Importance Sampling , 2002, ECCV.

[14]  Yaron Caspi,et al.  Increasing Space-Time Resolution in Video , 2002, ECCV.

[15]  Pascal Fua,et al.  Model-Based Silhouette Extraction for Accurate People Tracking , 2002, ECCV.

[16]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[17]  Rogério Schmidt Feris,et al.  Hierarchical wavelet networks for facial feature localization , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[18]  Stan Sclaroff,et al.  An appearance-based framework for 3D hand shape classification and camera viewpoint estimation , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[19]  Haiying Guan,et al.  Model-based 3D hand posture estimation from a single 2D image , 2002, Image Vis. Comput..

[20]  Lorenzo Torresani,et al.  Tracking and modeling non-rigid objects with rank constraints , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Ian D. Reid,et al.  Automatic partitioning of high dimensional search spaces associated with articulated body motion capture , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24]  Yang Song,et al.  Learning probabilistic structure for human motion detection , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  François Bérard,et al.  Bare-hand human-computer interaction , 2001, PUI '01.

[26]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Paulo R. S. Mendonça,et al.  Model-Based Hand Tracking Using an Unscented Kalman Filter , 2001, BMVC.

[28]  Narendra Ahuja,et al.  Face Detection and Gesture Recognition for Human-Computer Interaction , 2001, The International Series in Video Computing.

[29]  Katsuhiko Sakaue,et al.  The Hand Mouse: GMM hand-color classification and mean shift tracking , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[30]  Yi-Ping Hung,et al.  Fast algorithm for nearest neighbor search based on a lower bound tree , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[31]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[32]  Luciano da Fontoura Costa,et al.  Shape Analysis and Classification: Theory and Practice , 2000 .

[33]  David W. Murray,et al.  Wearable Visual Robots , 2000, Digest of Papers. Fourth International Symposium on Wearable Computers.

[34]  Junior Barrera,et al.  Morphological operators for segmentation of color sequences , 2000, Proceedings 13th Brazilian Symposium on Computer Graphics and Image Processing (Cat. No.PR00878).

[35]  Shaogang Gong,et al.  Dynamic Vision - From Images to Face Recognition , 2000 .

[36]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Luc Van Gool,et al.  A Compact Model for Viewpoint Dependent Texture Synthesis , 2000, SMILE.

[38]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[39]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[40]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[41]  Roberto Marcondes Cesar Junior,et al.  Detection and Tracking of Facial Features in Video Sequences , 2000, MICAI.

[42]  Alexander H. Waibel,et al.  Segmenting hands of arbitrary color , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[43]  Carlos Hitoshi Morimoto,et al.  Pupil detection and tracking using multiple light sources , 2000, Image Vis. Comput..

[44]  Daniel Snow,et al.  Determining Generative Models of Objects Under Varying Illumination: Shape and Albedo from Multiple Images Using SVD and Integrability , 1999, International Journal of Computer Vision.

[45]  Wendy S. Ark,et al.  At What Cost Pervasive? A Social Comuting View of Mobile Computing Systems , 1999, IBM Syst. J..

[46]  David W. Murray,et al.  Modeling and copying human head movements , 1999, IEEE Trans. Robotics Autom..

[47]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[48]  Shumin Zhai,et al.  Keeping an eye for HCI , 1999, XII Brazilian Symposium on Computer Graphics and Image Processing (Cat. No.PR00481).

[49]  Andrew Blake,et al.  Tracking through singularities and discontinuities by random sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[50]  Alexei A. Efros,et al.  Texture synthesis by non-parametric sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[51]  Shaogang Gong,et al.  Exploiting Context in Gesture Recognition , 1999, CONTEXT.

[52]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[53]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[54]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[55]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[56]  Vibhu O. Mittal,et al.  Assistive Technology and Artificial Intelligence: Applications in Robotics, User Interfaces and Natural Language Processing , 1998 .

[57]  W. Eric L. Grimson,et al.  Using adaptive tracking to classify and monitor activities in a site , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[58]  King Ngi Ngan,et al.  Locating facial region of a head-and-shoulders color image , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[59]  Shaogang Gong,et al.  Segmentation and Tracking Using Color Mixture Models , 1998, ACCV.

[60]  Alexander H. Waibel,et al.  Skin-Color Modeling and Adaptation , 1998, ACCV.

[61]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[62]  Alex Pentland,et al.  Parametrized structure from motion for 3D adaptive feedback tracking of faces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  C.D. Martin,et al.  The Media Equation: How People Treat Computers, Television and New Media Like Real People and Places [Book Review] , 1997, IEEE Spectrum.

[64]  Alexander H. Waibel,et al.  A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[65]  Richard A. Foulds,et al.  Toward robust skin identification in video images , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[66]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[67]  Alex Pentland,et al.  Real-time self-calibrating stereo person tracking using 3-D shape estimation from blob features , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[68]  Pietro Perona,et al.  Monocular tracking of the human arm in 3D: real-time implementation and experiments , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[69]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[70]  Charles A. Poynton,et al.  A technical introduction to digital video , 1996 .

[71]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[72]  Francis K. H. Quek Eyes in the interface , 1995, Image Vis. Comput..

[73]  David C. Hogg,et al.  An Adaptive Eigenshape Model , 1995, BMVC.

[74]  Timothy F. Cootes,et al.  Active Shape Models and the Shape Approximation Problem , 1995, BMVC.

[75]  Pietro Perona,et al.  Monocular tracking of the human arm in 3D , 1995, Proceedings of IEEE International Conference on Computer Vision.

[76]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1995, Proceedings of IEEE International Conference on Computer Vision.

[77]  Joseph O'Rourke,et al.  Computational Geometry in C. , 1995 .

[78]  Albert M. Cook,et al.  Assistive Technologies: Principles and Practice , 1995 .

[79]  S. Ahmad,et al.  A usable real-time 3D hand tracker , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[80]  Chris Harris,et al.  A Video Based Tracker for use in Computer Aided Surgery , 1994, BMVC.

[81]  U. Rembold,et al.  KANTRA-human-machine interaction for intelligent robots using natural language , 1994, Proceedings of 1994 3rd IEEE International Workshop on Robot and Human Communication.

[82]  Chris Harris,et al.  Tracking with rigid models , 1993 .

[83]  Joan L. Mitchell,et al.  JPEG: Still Image Data Compression Standard , 1992 .

[84]  Didier Le Gall,et al.  MPEG: a video compression standard for multimedia applications , 1991, CACM.

[85]  Roger Y. Tsai,et al.  A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses , 1987, IEEE J. Robotics Autom..

[86]  Francis L. Merat,et al.  Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..

[87]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[88]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[89]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[90]  Alvy Ray Smith,et al.  Color gamut transform pairs , 1978, SIGGRAPH.

[91]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[92]  W D Wright,et al.  Color Science, Concepts and Methods. Quantitative Data and Formulas , 1967 .

[93]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[94]  David W. Murray,et al.  Head pose estimation for wearable robot control , 2002, BMVC.

[95]  Ben Tordoff,et al.  Active control of zoom for computer vision , 2002 .

[96]  Yoshiaki Shirai,et al.  Hand Shape Estimation Using Sequence of Multi-Ocular Images Based on Transition Network , 2002 .

[97]  Birgitta Martinkauppi,et al.  Face colour under varying illumination - analysis and applications , 2002 .

[98]  Ian D. Reid,et al.  Providing synthetic views for teleoperation using visual pose tracking in multiple cameras , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[99]  Andrew Zisserman,et al.  Robust Object Tracking , 2001 .

[100]  Thia Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software , 2001 .

[101]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[102]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[103]  B. Triggs,et al.  A Robust Multiple Hypothesis Approach to Monocular Human Motion Tracking , 2000 .

[104]  Alex Pentland,et al.  Looking at People: Sensing for Ubiquitous and Wearable Computing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[105]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[106]  Sharath Pankanti,et al.  Biometrics: Future of Identification , 2000 .

[107]  Michael E. Bratman,et al.  Faces of Intention: Selected Essays on Intention and Agency , 1999 .

[108]  Thomas S. Huang,et al.  BattleView: A Multimodal HCI Research Application , 1998 .

[109]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[110]  T. Minka Expectation-Maximization as lower bound maximization , 1998 .

[111]  Shaogang Gong,et al.  Tracking Head Pose for Inferring Intention , 1998 .

[112]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[113]  L. Davis,et al.  W 4 S: a Real-time System for Detecting and Tracking People in 2 1 2 D , 1998 .

[114]  Yair Weiss,et al.  Motion Segmentation using EM - a short tutorial , 1996 .

[115]  Michael J. Black,et al.  Cardboard people: A parametrized model of articulated motion , 1996 .

[116]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[117]  S. P. Mudur,et al.  Three-dimensional computer vision: a geometric viewpoint , 1993 .

[118]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[119]  I. Faux,et al.  Computational Geometry for Design and Manufacture , 1979 .

[120]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..