3D Hand Pose Reconstruction Using Specialized Mappings

A system for recovering 3D hand pose from monocular color sequences is proposed. The system employs a non-linear supervised learning framework, the specialized mappings architecture (SMA), to map image features to likely 3D hand poses. The SMA’s fundamental components are a set of specialized forward mapping functions, and a single feedback matching function. The forward functions are estimated directly from training data, which in our case are examples of hand joint configurations and their corresponding visual features. The joint angle data in the training set is obtained via a CyberGlove, a glove with 22 sensors that monitor the angular motions of the palm and fingers. In training, the visual features are generated using a computer graphics module that renders the hand from arbitrary viewpoints given the 22 joint angles. The viewpoint is encoded by two real values, therefore 24 real values represent a hand pose. We test our system both on synthetic sequences and on sequences taken with a color camera. The system automatically detects and tracks both hands of the user, calculates the appropriate features, and estimates the 3D hand joint angles and viewpoint from those features. Results are encouraging given the complexity of the task.

[1]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[2]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[3]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[4]  James M. Rehg Visual analysis of high DOF articulated objects with application to hand tracking , 1995 .

[5]  Alex Pentland,et al.  Task-Specific Gesture Analysis in Real-Time Using Interpolated Views , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[7]  Ipke Wachsmuth,et al.  Gesture Recognition of the Upper Limbs - From Signal to Symbol , 1997, Gesture Workshop.

[8]  Markus Kohler,et al.  Special Topics of Gesture Recognition Applied in Intelligent Home Environments , 1997, Gesture Workshop.

[9]  Stan Sclaroff,et al.  Improved Tracking of Multiple Humans with Trajectory Predcition and Occlusion Modeling , 1998 .

[10]  Takeo Kanade,et al.  Rotation invariant neural network-based face detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[11]  Brian Sallans,et al.  A Hierarchical Community of Experts , 1999, Learning in Graphical Models.

[12]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[13]  James L. Crowley,et al.  Active hand tracking , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[14]  J. J. Weng,et al.  Computer Vision for Human–Machine Interaction: Recognition of Hand Signs from Complex Backgrounds , 1998 .

[15]  Ming Ouhyoung,et al.  A real-time continuous gesture recognition system for sign language , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[16]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[17]  Yoshiaki Shirai,et al.  Hand gesture estimation and model refinement using monocular camera-ambiguity limitation by inequality constraints , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[18]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Fumio Miyazaki,et al.  Description and recognition of human gestures based on the transition of curvature from motion images , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[20]  Matthew Turk,et al.  View-based interpretation of real-time optical flow for gesture recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[21]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Narendra Ahuja,et al.  Recognizing hand gesture using motion trajectories , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[23]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[24]  Fabrice Heitz,et al.  Gesture localization and recognition using probabilistic visual learning , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[25]  Dimitris N. Metaxas,et al.  Toward Scalability in ASL Recognition: Breaking Down Signs into Phonemes , 1999, Gesture Workshop.

[26]  Akira Utsumi,et al.  Multiple-hand-gesture tracking using multiple cameras , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[27]  Wen Gao,et al.  A continuous Chinese sign language recognition system , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[28]  Masaru Takeuchi,et al.  A method for recognizing a sequence of sign language words represented in a Japanese sign language sentence , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[29]  Ying Wu,et al.  View-independent recognition of hand postures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[30]  Rómer Rosales,et al.  Specialized mappings and the estimation of human body pose from a single image , 2000, Proceedings Workshop on Human Motion.

[31]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[32]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[33]  Gary R. Bradski,et al.  Stereo based gesture recognition invariant to 3D pose and lighting , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[34]  Yang Song,et al.  Towards detection of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[35]  Mansoor Sarhadi,et al.  Non-linear statistical models for the 3D reconstruction of human pose and motion from monocular image sequences , 2000, Image Vis. Comput..

[36]  Stan Sclaroff,et al.  Estimation and prediction of evolving color distributions for skin segmentation under varying illumination , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).