Fast stochastic optimization for articulated structure tracking

Recently, an optimization approach for fast visual tracking of articulated structures based on stochastic meta-descent (SMD) [7] has been presented. SMD is a gradient descent with local step size adaptation that combines rapid convergence with excellent scalability. Stochastic sampling helps to avoid local minima in the optimization process. We have extended the SMD algorithm with new features for fast and accurate tracking by adapting the different step sizes between as well as within video frames and by introducing a robust cost function, which incorporates both depths and surface orientations. The advantages of the resulting tracker over state-of-the-art methods are supported through 3D hand tracking experiments. A realistic deformable hand model reinforces the accuracy of our tracker.

[1]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Pascal Fua,et al.  Articulated Soft Objects for Video-based Body Modeling , 2001, ICCV.

[3]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[4]  Michael H. Lin Tracking articulated objects in real-time range image sequences , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, ECCV.

[6]  Marc Levoy,et al.  Real-time 3D model acquisition , 2002, ACM Trans. Graph..

[7]  Daniel Thalmann,et al.  Simulation of object and human skin formations in a grasping task , 1989, SIGGRAPH.

[8]  Luc Van Gool,et al.  One-shot active 3D shape acquisition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[9]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  Luc Van Gool,et al.  Smart particle filtering for 3D hand tracking , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[12]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[13]  Daniel Thalmann,et al.  Joint-dependent local deformations for hand animation and object grasping , 1989 .

[14]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[15]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[16]  Rómer Rosales,et al.  3D Hand Pose Reconstruction Using Specialized Mappings , 2001, ICCV.

[17]  William H. Press,et al.  Numerical recipes in C , 2002 .

[18]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[19]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[20]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[21]  Carlo Tomasi,et al.  3D tracking = classification + interpolation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[23]  Edwin Catmull,et al.  A system for computer generated movies , 1972, ACM Annual Conference.

[24]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[25]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[26]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[27]  Stan Sclaroff,et al.  An appearance-based framework for 3D hand shape classification and camera viewpoint estimation , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[28]  Luc Van Gool,et al.  Real-time range scanning of deformable surfaces by adaptively coded structured light , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[29]  Thomas S. Huang,et al.  Vision based hand modeling and tracking for virtual teleconferencing and telecollaboration , 1995, Proceedings of IEEE International Conference on Computer Vision.

[30]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[31]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[32]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[33]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[34]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Yoshiaki Shirai,et al.  Real-time 3D hand posture estimation based on 2D appearance retrieval using monocular camera , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[36]  Nicol N. Schraudolph,et al.  3D hand tracking by rapid stochastic gradient descent using a skinning model , 2004 .

[37]  Koji Komatsu,et al.  Human skin model capable of natural shape variation , 1988, The Visual Computer.

[38]  Olivier D. Faugeras,et al.  3D Articulated Models and Multiview Tracking with Physical Forces , 2001, Comput. Vis. Image Underst..

[39]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[40]  R. Plankers,et al.  Articulated soft objects for video-based body modeling , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[41]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .