Articulated object tracking by rendering consistent appearance parts

We describe a general methodology for tracking 3-dimensional objects in monocular and stereo video that makes use of GPU-accelerated filtering and rendering in combination with machine learning techniques. The method operates on targets consisting of kinematic chains with known geometry. The tracked target is divided into one or more areas of consistent appearance. The appearance of each area is represented by a classifier trained to assign a class-conditional probability to image feature vectors. A search is then performed on the configuration space of the target to find the maximum likelihood configuration. In the search, candidate hypotheses are evaluated by rendering a 3D model of the target object and measuring its consistency with the class probability map. The method is demonstrated for tool tracking on videos from two surgical domains, as well as in a human hand-tracking task.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Tomás Lozano-Pérez,et al.  An algorithm for planning collision-free paths among polyhedral obstacles , 1979, CACM.

[3]  D. Lowe Fitting Parameterized 3-D Models to Images , 1989 .

[4]  Sang Uk Lee,et al.  A comparative performance study of several global thresholding techniques for segmentation , 1990, Comput. Vis. Graph. Image Process..

[5]  Linda G. Shapiro,et al.  Computer and Robot Vision , 1991 .

[6]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[9]  Emanuele Trucco,et al.  Computer and Robot Vision , 1995 .

[10]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[12]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Radu Horaud,et al.  Visual tracking of an end-effector by adaptive kinematic prediction , 1997, Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications. IROS '97.

[15]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[16]  Shaogang Gong,et al.  Tracking colour objects using adaptive mixture models , 1999, Image Vis. Comput..

[17]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[18]  Marco La Cascia,et al.  Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[20]  Roberto Cipolla,et al.  Real-Time Visual Tracking of Complex Structures , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[22]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Radu Horaud,et al.  Hand Motion from 3D Point Trajectories and a Smooth Surface Model , 2004, ECCV.

[24]  Gregory D. Hager,et al.  A Three Tiered Approach for Articulated Object Action Modeling and Recognition , 2004, NIPS.

[25]  Bodo Rosenhahn,et al.  Pose Estimation of 3D Free-Form Contours , 2005, International Journal of Computer Vision.

[26]  Jean-Baptiste de la Rivière,et al.  Image-based analysis for model-based tracking , 2005 .

[27]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Yong Rui,et al.  Robust Visual Tracking via Pixel Classification and Integration , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[29]  Horst Bischof,et al.  Learning Features for Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Michael F. Cohen,et al.  Monocular Video Foreground/Background Segmentation by Tracking Spatial-Color Gaussian Mixture Models , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[32]  Gregory D. Hager,et al.  A Nonparametric Treatment for Location/Segmentation Based Visual Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  D. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Cristian Sminchisescu,et al.  BM³E : Discriminative Density Propagation for Visual Tracking , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Wallace S. Rutkowski,et al.  TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2022 .