Estimation algorithms for ambiguous visual models : Three Dimensional Human Modeling and Motion Reconstruction in Monocular Video Sequences. (Algorithmes d'estimation pour des modèles visuels ambigus : Modélisation Humaine Tridimensionnelle et Reconstruction du Mouvement dans des Séquences Vidéo Mon

This thesis studies the problem of tracking and reconstructing three-dimensional articulated human motion in monocular video sequences. This is an important problem with applications in areas like markerless motion capture for animation and virtual reality, video indexing, human-computer interaction or intelligent surveillance. A system that aims to reconstruct 3D human motion using single camera sequences faces difficulties caused by the lossy nature of monocular projection and the high-dimensionality required for 3D human modeling. The complexities of human articular structure, shape and their physical constraints, and the large variability in image observations involving humans, render the solution non-trivial. We focus on the general problem of 3D human motion estimation using monocular video streams. Hence, we can not exploit the simplifications brought by using multiple cameras or strong dynamical models such as walking, and we minimize assumptions about clothing and background structure. In this unrestricted setting, the posterior likelihoods over human pose space are inevitably highly multi-modal, and efficiently locating and tracking the most prominent peaks is a major computational challenge. To address these problems, we propose a model that incorporates realistic kinematics and several important human body constraints, and a principled, robust and probabilistically motivated integration of different visual cues like contours, intensity or silhouettes. We then derive three novel continuous multiple-hypothesis search techniques that allow either deterministic or stochastic localization of nearby peaks in the high-dimensional human pose likelihood surface: Covariance Scaled Sampling, Eigenvector Tracking and Hypersurface Sweeping and Hyperdynamic Importance Sampling. The search methods give general, principled approaches to the deterministic exploration of the non-convex error surfaces so often encountered in computational vision problems. The combined system allows monocular tracking of unconstrained human motions in clutter."

[1]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[2]  David J. Kriegman,et al.  Structure and Motion from Line Segments in Multiple Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[4]  K. Rohr Towards model-based recognition of human movements in image sequences , 1994 .

[5]  A. Voter Hyperdynamics: Accelerated Molecular Dynamics of Infrequent Events , 1997 .

[6]  N. Russo,et al.  Transition states and energy barriers from density functional studies: Representative isomerization reactions , 1994 .

[7]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[8]  D. Mumford Elastica and Computer Vision , 1994 .

[9]  Minas E. Spetsakis,et al.  A multi-frame approach to visual motion perception , 2004, International Journal of Computer Vision.

[10]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[11]  W. Freeman,et al.  Bayesian Estimation of 3-D Human Motion , 1998 .

[12]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[13]  Song-Chun Zhu,et al.  Embedding Gestalt Laws in Markov Random Fields , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Neil A. Dodgson,et al.  Proceedings Ninth IEEE International Conference on Computer Vision , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  W. Miller,et al.  ON FINDING TRANSITION STATES , 1981 .

[16]  Lindsey J. Munro,et al.  DEFECT MIGRATION IN CRYSTALLINE SILICON , 1999 .

[17]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[18]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  David A. Forsyth,et al.  Human Tracking with Mixtures of Trees , 2001, ICCV.

[20]  Ioannis A. Kakadiaris,et al.  3D human body model acquisition from multiple views , 1995, Proceedings of IEEE International Conference on Computer Vision.

[21]  R. Plänkers,et al.  Human body modeling from video sequences , 2001 .

[22]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[23]  Roberto Cipolla,et al.  Real-time tracking of highly articulated structures in the presence of noisy measurements , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[24]  Stefan Carlsson,et al.  Uncalibrated Motion Capture Exploiting Articulated Structure Constraints , 2004, International Journal of Computer Vision.

[25]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[26]  Pietro Perona,et al.  Monocular tracking of the human arm in 3D: real-time implementation and experiments , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[27]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[28]  Cristian Sminchisescu,et al.  Building Roadmaps of Local Minima of Visual Models , 2002, ECCV.

[29]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[30]  T. Helgaker Transition-state optimizations by trust-region image minimization , 1991 .

[31]  G. Henkelman,et al.  A dimer method for finding saddle points on high dimensional potential surfaces using only first derivatives , 1999 .

[32]  Cristian Sminchisescu,et al.  Human Pose Estimation from Silhouettes - A Consistent Approach Using Distance Level Sets , 2002, WSCG.

[33]  Demetri Terzopoulos,et al.  Constraints on Deformable Models: Recovering 3D Shape and Nonrigid Motion , 1988, Artif. Intell..

[34]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[35]  Steven M. Seitz Implicit Scene Reconstruction from Probability Density Functions , 1998 .

[36]  A. Voter,et al.  Temperature-accelerated dynamics for simulation of infrequent events , 2000 .

[37]  Alexei A. Efros,et al.  Texture synthesis by non-parametric sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[38]  Neil J. Gordon,et al.  Bayesian State Estimation for Tracking and Guidance Using the Bootstrap Filter , 1993 .

[39]  Kiriakos N. Kutulakos,et al.  Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance , 2002, International Journal of Computer Vision.

[40]  F. Jensen Locating transition structures by mode following: A comparison of six methods on the Ar8 Lennard‐Jones potential , 1995 .

[41]  Stefano Soatto,et al.  Optimal Structure from Motion: Local Ambiguities and Global Estimates , 2004, International Journal of Computer Vision.

[42]  David C. Hogg,et al.  Wormholes in shape space: tracking through discontinuous changes in shape , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[43]  Andrew Blake,et al.  A probabilistic contour discriminant for object localisation , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[44]  Olivier D. Faugeras,et al.  Feed-forward recovery of motion and structure from a sequence of 2D-lines matches , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[45]  G M Crippen,et al.  Minimization of polypeptide energy. X. A global search algorithm. , 1971, Archives of biochemistry and biophysics.

[46]  Andrew Blake,et al.  Tracking through singularities and discontinuities by random sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[47]  A. V. Levy,et al.  The Tunneling Algorithm for the Global Minimization of Functions , 1985 .

[48]  E. Sevick,et al.  A chain of states method for investigating infrequent event processes occurring in multistate, multidimensional systems , 1993 .

[49]  Hsi-Jian Lee,et al.  Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[50]  Rlchard L. Hilderbrandt,et al.  Application of Newton-Raphson optimization techniques in molecular mechanics calculations , 1977, Comput. Chem..

[51]  H. Scheraga,et al.  Minimization of polypeptide energy. XI. The method of gentlest ascent. , 1971, Archives of biochemistry and biophysics.

[52]  G. Vineyard Frequency factors and isotope effects in solid state rate processes , 1957 .

[53]  B. Triggs,et al.  A Robust Multiple Hypothesis Approach to Monocular Human Motion Tracking , 2000 .

[54]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[55]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[56]  Richard Szeliski,et al.  Recovering 3D Shape and Motion from Image Streams Using Nonlinear Least Squares , 1994, J. Vis. Commun. Image Represent..

[57]  Dimitris N. Metaxas,et al.  Dynamic 3D models with local and global deformations: deformable superquadrics , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[58]  Michael J. Black,et al.  Cardboard people: A parametrized model of articulated motion , 1996 .

[59]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[60]  S. B. Kang,et al.  Recovering 3 D Shape and Motion from Image Streams using Non-Linear Least Squares , 1993 .

[61]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[62]  Thomas S. Huang,et al.  Determining 3-D motion and structure of a rigid body using straight line correspondences , 1983, ICASSP.

[63]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[64]  Amnon Shashua,et al.  Algebraic Functions For Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Gregory D. Hager,et al.  Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Roberto Cipolla,et al.  Real-Time Tracking of Multiple Articulated Structures in Multiple Views , 2000, ECCV.

[67]  Barkema,et al.  Event-Based Relaxation of Continuous Disordered Systems. , 1996, Physical review letters.

[68]  Thomas S. Huang,et al.  Motion and structure from feature correspondences: a review , 1994, Proc. IEEE.

[69]  C. Bajaj Algebraic Geometry and its Applications , 1994 .

[70]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[71]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[72]  Azriel Rosenfeld,et al.  3-D Shape Recovery Using Distributed Aspect Matching , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Josep Maria Bofill,et al.  Updated Hessian matrix and the restricted step method for locating transition structures , 1994, J. Comput. Chem..

[74]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Jun S. Liu,et al.  Metropolized independent sampling with comparisons to rejection sampling and importance sampling , 1996, Stat. Comput..

[76]  Michael J. Black,et al.  Learning image statistics for Bayesian tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[77]  P. Jørgensen,et al.  A gradient extremal walking algorithm , 1988 .

[78]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[79]  Cristian Sminchisescu,et al.  Hyperdynamics Importance Sampling , 2002, ECCV.

[80]  Nando de Freitas,et al.  The Unscented Particle Filter , 2000, NIPS.

[81]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[82]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1999, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[83]  P. Culot,et al.  A quasi-Newton algorithm for first-order saddle-point location , 1992 .

[84]  J. F. Price,et al.  On descent from local minima , 1971 .

[85]  James M. Rehg,et al.  Reconstruction of 3-D Figure Motion from 2-D Correspondences , 2001, CVPR 2001.

[86]  Cristian Sminchisescu Consistency and coupling in human model likelihoods , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[87]  D. Wales,et al.  Theoretical study of the water pentamer , 1996 .

[88]  Michael Isard,et al.  BraMBLe: a Bayesian multiple-blob tracker , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[89]  M. Pitt,et al.  Filtering via Simulation: Auxiliary Particle Filters , 1999 .

[90]  Berthold K. P. Horn Relative orientation , 1987, International Journal of Computer Vision.

[91]  Sven J. Dickinson,et al.  Improving the scope of deformable model shape and motion estimation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[92]  Larry S. Davis,et al.  W/sup 4/: Who? When? Where? What? A real time system for detecting and tracking people , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[93]  Alan H. Barr,et al.  Global and local deformations of solid primitives , 1984, SIGGRAPH.

[94]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[95]  J. Aggarwal,et al.  LINE-BASED COMPUTATION OF STRUCTURE AND MOTION USING ANGULAR INVARIANCE. , 1986 .

[96]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[97]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[98]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[99]  James M. Rehg,et al.  Singularity analysis for articulated object tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[100]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[101]  D. Mumford Pattern theory: a unifying perspective , 1996 .

[102]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[103]  Y. Bar-Shalom Tracking and data association , 1988 .

[104]  Hans-Hellmut Nagel,et al.  Tracking Persons in Monocular Image Sequences , 1999, Comput. Vis. Image Underst..

[105]  Cristian Sminchisescu,et al.  A Framework for Generic State Estimation in Computer Vision Applications , 2001, ICVS.

[106]  David A. Forsyth,et al.  How Does CONDENSATION Behave with a Finite Number of Samples? , 2000, ECCV.

[107]  Alok Gupta,et al.  Dynamic Programming for Detecting, Tracking, and Matching Deformable Contours , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[108]  Ioannis A. Kakadiaris,et al.  Estimating anthropometry and pose from a single image , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[109]  Steven G. Louie,et al.  A Monte carlo simulated annealing approach to optimization over continuous variables , 1984 .

[110]  Y. Abashkin,et al.  Transition state structures and reaction profiles from constrained optimization procedure. Implementation in the framework of density functional theory , 1994 .

[111]  Richard I. Hartley,et al.  Lines and Points in Three Views and the Trifocal Tensor , 1997, International Journal of Computer Vision.

[112]  Pietro Perona,et al.  Monocular tracking of the human arm in 3D , 1995, Proceedings of IEEE International Conference on Computer Vision.

[113]  Dimitris N. Metaxas,et al.  The integration of optical flow and deformable models with applications to human face shape and motion estimation , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[114]  Ioannis A. Kakadiaris,et al.  Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[115]  A. Voter A method for accelerating the molecular dynamics simulation of infrequent events , 1997 .

[116]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[117]  Alex Pentland,et al.  Recovery of non-rigid motion and structure , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[118]  Kenichi Kanatani,et al.  Statistical optimization for geometric computation - theory and practice , 1996, Machine intelligence and pattern recognition.

[119]  Patrick Pérez,et al.  JetStream: probabilistic contour extraction with particles , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[120]  Michael Isard,et al.  Object localization by Bayesian correlation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[121]  David A. Forsyth,et al.  The Joy of Sampling , 2004, International Journal of Computer Vision.

[122]  R. Fletcher Practical Methods of Optimization , 1988 .

[123]  Michael Isard,et al.  Learning Multi-Class Dynamics , 1998, NIPS.

[124]  A. Griewank Generalized descent for global optimization , 1981 .

[125]  Ian D. Reid,et al.  Automatic partitioning of high dimensional search spaces associated with articulated body motion capture , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[126]  D. Wales Finding saddle points for clusters , 1989 .

[127]  Alex Pentland,et al.  Recursive Estimation of Motion, Structure, and Focal Length , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[128]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[129]  G. Walsh,et al.  A graphical method for a class of Branin trajectories , 1986 .

[130]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[131]  Tomaso A. Poggio,et al.  Trainable pedestrian detection , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[132]  Pascal Fua,et al.  Articulated Soft Objects for Video-based Body Modeling , 2001, ICCV.

[133]  G. Barkema,et al.  Traveling through potential energy landscapes of disordered materials: The activation-relaxation technique , 1997, cond-mat/9710023.

[134]  Alex Pentland,et al.  Closed-form solutions for physically-based shape modeling and recognition , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[135]  Cristian Sminchisescu,et al.  Incremental Model-Based Estimation Using Geometric Consistency Constraints , 2001 .

[136]  Dimitris N. Metaxas,et al.  Incorporating illumination constraints in deformable models , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[137]  Pascal Fua,et al.  Taking Advantage of Image-Based and Geometry-Based Constraints to Recover 3-D Surfaces , 1996, Comput. Vis. Image Underst..

[138]  Michael J. Black,et al.  On the unification of line processes, outlier rejection, and robust statistics with applications in early vision , 1996, International Journal of Computer Vision.

[139]  P. Jørgensen,et al.  Walking on potential energy surfaces , 1983 .

[140]  Marc Levoy,et al.  Fast texture synthesis using tree-structured vector quantization , 2000, SIGGRAPH.

[141]  Michael J. Black Robust incremental optical flow , 1992 .

[142]  Aimo A. Törn,et al.  Stochastic Global Optimization: Problem Classes and Solution Techniques , 1999, J. Glob. Optim..

[143]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[144]  James L. Crowley,et al.  Measurement and integration of 3-D structures by tracking edge lines , 1990, International Journal of Computer Vision.

[145]  Andrew Zisserman,et al.  Robust Object Tracking , 2001 .

[146]  David J. Fleet,et al.  People tracking using hybrid Monte Carlo filtering , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[147]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.

[148]  Jun-qiang Sun,et al.  Gradient extremals and steepest descent lines on potential energy surfaces , 1993 .

[149]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[150]  David G. Lowe,et al.  Rigidity Checking of 3D Point Correspondences Under Perspective Projection , 1996, IEEE Trans. Pattern Anal. Mach. Intell..