Tracking Motion, Deformation, and Texture Using Conditionally Gaussian Processes

We present a generative model and inference algorithm for 3D nonrigid object tracking. The model, which we call G-flow, enables the joint inference of 3D position, orientation, and nonrigid deformations, as well as object texture and background texture. Optimal inference under G-flow reduces to a conditionally Gaussian stochastic filtering problem. The optimal solution to this problem reveals a new space of computer vision algorithms, of which classic approaches such as optic flow and template matching are special cases that are optimal only under special circumstances. We evaluate G-flow on the problem of tracking facial expressions and head motion in 3D from single-camera video. Previously, the lack of realistic video data with ground truth nonrigid position information has hampered the rigorous evaluation of nonrigid tracking. We introduce a practical method of obtaining such ground truth data and present a new face video data set that was created using this technique. Results on this data set show that G-flow is much more robust and accurate than current deterministic optic-flow-based approaches.

[1]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Lorenzo Torresani,et al.  Tracking and modeling non-rigid objects with rank constraints , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[4]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[5]  Frank Dellaert,et al.  A Rao-Blackwellized particle filter for EigenTracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  Jing Xiao,et al.  A Closed-Form Solution to Non-Rigid Shape and Motion Recovery , 2004, International Journal of Computer Vision.

[7]  David J. Kriegman,et al.  Visual tracking using learned linear subspaces , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[9]  Ian R. Fasel,et al.  A generative framework for real time object detection and classification , 2005, Comput. Vis. Image Underst..

[10]  Simon Baker,et al.  2D vs. 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting , 2007, International Journal of Computer Vision.

[11]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[12]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[14]  P. R. KumarDepartment ON KALMAN FILTERING FOR CONDITIONALLY GAUSSIAN SYSTEMS WITH RANDOM MATRICES , 1989 .

[15]  Aaron Hertzmann,et al.  Learning Non-Rigid 3D Shape from 2D Motion , 2003, NIPS.

[16]  Javier R. Movellan,et al.  Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters , 2004, NIPS.

[17]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[18]  Geoffrey E. Hinton,et al.  Learning Causally Linked Markov Random Fields , 2005, AISTATS.

[19]  Jun S. Liu,et al.  Mixture Kalman filters , 2000 .

[20]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Nebojsa Jojic,et al.  A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Matthew Brand,et al.  Morphable 3D models from video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[24]  Javier R. Movellan,et al.  3D Tracking of Morphable Objects Using Conditionally Gaussian Nonlinear Filters , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[25]  Matthew Brand,et al.  A direct method for 3D factorization of nonrigid motion observed in 2D , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  A. Doucet,et al.  Particle filtering for partially observed Gaussian state space models , 2002 .

[28]  Jing Xiao,et al.  Real-time combined 2D+3D active appearance models , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[29]  Nando de Freitas,et al.  Diagnosis by a waiter and a Mars explorer , 2004, Proceedings of the IEEE.

[30]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[31]  Aaron Hertzmann,et al.  Automatic Non-rigid 3D Modeling from Video , 2004, ECCV.

[32]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[33]  Matthew Brand,et al.  Flexible flow for 3D nonrigid tracking and shape recovery , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[34]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[35]  Tim K. Marks Facing uncertainty : 3D face tracking and learning with generative models , 2006 .

[36]  Frank Dellaert,et al.  Jacobian images of super-resolved texture maps for model-based motion estimation and tracking , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).