Visual hull construction, alignment and refinement for human kinematic modeling, motion tracking and rendering

The abilities to build precise human kinematic models and to perform accurate human motion tracking are essential in a wide variety of applications. Due to the complexity of the human bodies and the problem of self-occlusion, modeling and tracking humans using cameras are challenging tasks. In this thesis, we develop algorithms to perform these two tasks based on the shape estimation method Shape-From-Silhouette (SFS) which constructs a shape estimate (known as Visual Hull) of an object using its silhouettes images. In the first half of this thesis we extend the traditional SFS algorithm so that it can be used effectively for the human related applications. To perform SFS in real-time, we propose a fast testing/projection algorithm for voxel-based SFS algorithms. Moreover, we combine silhouette information over time to effectively increase the number of cameras (and hence reconstruction details) for SFS without physically adding new cameras. We first propose a new Visual Hull representation called Bounding Edges. We then analyze the ambiguity problem of aligning two Visual Hulls. Based on the analysis, we develop an algorithm to align Visual Hulls over time using stereo and an important property of the Shape-From-Silhouette principle. This temporal SFS algorithm combines both geometric constraints and photometric consistency to align Colored Surface Points of the object extracted from the silhouette and color images. Once the Visual Hulls are aligned, they are refined by compensating for the motion of the object. The algorithm is developed for both rigid and articulated objects. In the second half of this thesis we show how the improved SFS algorithms are used to perform the tasks of human modeling and motion tracking. First we build a system to acquire human kinematic models consisting of precise shape and joint locations. Once the kinematic models are built, they are used to track the motion of the person in new video sequences. The tracking algorithm is based on the Visual Hull alignment idea used in the temporal SFS algorithms. Finally we demonstrate how the kinematic model and the tracked motion data can be used for image-based rendering and motion transfer between two people.

[1]  Bruce G. Baumgart,et al.  Geometric modeling for computer vision. , 1974 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  James F. Blinn,et al.  A generalization of algebraic surface drawing , 1982, SIGGRAPH.

[4]  Jake K. Aggarwal,et al.  TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2008 .

[5]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[6]  Donald P. Greenberg,et al.  Improved Computational Methods for Ray Tracing , 1984, TOGS.

[7]  Jake K. Aggarwal,et al.  Rectangular parallelepiped coding: A volumetric representation of three-dimensional objects , 1986, IEEE J. Robotics Autom..

[8]  Roger Y. Tsai,et al.  A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses , 1987, IEEE J. Robotics Autom..

[9]  Michael Potmesil Generating octree models of 3D objects from their silhouettes in a sequence of images , 1987, Comput. Vis. Graph. Image Process..

[10]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[11]  Hiroshi Noborio,et al.  Construction of the Octree Approximating a Three-Dimensional Object by Using Multiple Views , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  James F. Blinn,et al.  Me and My (Fake) Shadow , 1988 .

[13]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[14]  Narendra Ahuja,et al.  Generating Octrees from Object Silhouettes in Orthographic Views , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Pierre Poulin,et al.  A survey of shadow algorithms , 1990, IEEE Computer Graphics and Applications.

[16]  Arun K. Pujari,et al.  Volume intersection with optimal set of directions , 1991, Pattern Recognit. Lett..

[17]  Aldo Laurentini,et al.  The visual hull: a new tool for contour-based image understanding , 1991 .

[18]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Richard Szeliski,et al.  Rapid octree construction from image sequences , 1993 .

[20]  Takeo Kanade,et al.  A Multiple-Baseline Stereo , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[22]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Richard Szeliski,et al.  Image mosaicing for tele-reality applications , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[24]  Ioannis A. Kakadiaris,et al.  Active part-decomposition, shape and motion estimation of articulated objects: a physics-based approach , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jake K. Aggarwal,et al.  Articulated and elastic non-rigid motion: a review , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[26]  J. Ponce,et al.  Towards structure and motion estimation from dynamic silhouettes , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[27]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[28]  Takeo Kanade,et al.  A Paraperspective Factorization Method for Shape and Motion Recovery , 1994, ECCV.

[29]  Yee-Hong Yang,et al.  First Sight: A Human Body Outline Labeling System , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Ioannis A. Kakadiaris,et al.  3D human body model acquisition from multiple views , 1995, Proceedings of IEEE International Conference on Computer Vision.

[31]  Václav Hlavác,et al.  Rendering real-world objects using view interpolation , 1995, Proceedings of IEEE International Conference on Computer Vision.

[32]  V. Leitáo,et al.  Computer Graphics: Principles and Practice , 1995 .

[33]  Aldo Laurentini,et al.  How Far 3D Shapes Can Be Understood from 2D Silhouettes , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Narendra Ahuja,et al.  Structure and Motion Estimation from Dynamic Silhouettes under Perspective Projection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[35]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[36]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[37]  David J. Kriegman,et al.  Structure and motion of curved 3D objects from monocular silhouettes , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[39]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[40]  Harpreet S. Sawhney,et al.  Compact Representations of Videos Through Dominant and Multiple Motion Estimation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Takeo Kanade,et al.  A factorization method for affine structure from line correspondences , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  James F. Blinn,et al.  Blue screen matting , 1996, SIGGRAPH.

[43]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[44]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[45]  L. Davis,et al.  el-based tracking of humans in action: , 1996 .

[46]  Jake K. Aggarwal,et al.  Tracking human motion using multiple cameras , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[47]  Steven M. Seitz,et al.  Image-based transformation of viewpoint and scene appearance , 1997 .

[48]  Aldo Laurentini How Many 2D Silhouettes Does It Take to Reconstruct a 3D Object? , 1997, Comput. Vis. Image Underst..

[49]  C. Bregler,et al.  Video Motion Capture , 1997 .

[50]  Michal Irani,et al.  Recovery of Ego-Motion Using Region Alignment , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  W. Niem Error Analysis for Silhouette – Based 3 D Shape Estimation from Multiple Views , 1997 .

[52]  Saied Moezzi,et al.  Virtual View Generation for 3D Digital Video , 1997, IEEE Multim..

[53]  Takeo Kanade,et al.  Virtualized reality: constructing time-varying virtual worlds from real world events , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[54]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[55]  Jr. Leonard McMillan,et al.  An Image-Based Approach to Three-Dimensional Computer Graphics , 1997 .

[56]  Martial Hebert,et al.  Control of Polygonal Mesh Resolution for 3-D Computer Vision , 1998, Graph. Model. Image Process..

[57]  Mary Czerwinski,et al.  The New EasyLiving Project at Microsoft Research , 1998 .

[58]  Larry S. Davis,et al.  W/sup 4/: Who? When? Where? What? A real time system for detecting and tracking people , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[59]  Michael H. Coen Design Principals for Intelligent Environments , 1998 .

[60]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[61]  Mark E. Lucente,et al.  Visualization Space: A Testbed for Deviceless Multimodal User Interface , 1998 .

[62]  Takeo Kanade,et al.  The 3D Room: Digitizing Time-Varying 3D Events by Synchronized Multiple Video Streams , 1998 .

[63]  Richard Szeliski,et al.  Stereo Matching with Transparency and Matting , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[64]  Katsushi Ikeuchi,et al.  Acquiring a Radiance Distribution to Superimpose Virtual Objects onto Real Scene , 2001, MVA.

[65]  Jake K. Aggarwal,et al.  Automatic tracking of human motion in indoor scenes across multiple synchronized video streams , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[66]  Takeo Kanade,et al.  Constructing virtual worlds using dense stereo , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[67]  Nebojsa Jojic,et al.  Tracking self-occluding articulated objects in dense disparity maps , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[68]  A. Bernhardt Traces : Wireless full body tracking in the CAVE , 1999 .

[69]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[70]  Aldo Laurentini The visual hull of curved objects , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[71]  Wojciech Matusik,et al.  Creating and Rendering Image-Based Visual Hulls , 1999 .

[72]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[73]  Paul A. Viola,et al.  Roxels: responsibility weighted 3D volume reconstruction , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[74]  Richard Szeliski,et al.  An integrated Bayesian approach to layer extraction from image sequences , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[75]  Takeo Kanade,et al.  Appearance-based virtual view generation of temporally-varying events from multi-camera images in the 3D room , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[76]  Zhengyou Zhang,et al.  Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[77]  R. Plankers,et al.  Automated body modeling from video sequences , 1999, Proceedings IEEE International Workshop on Modelling People. MPeople'99.

[78]  James M. Rehg,et al.  Dynamic feature ordering for efficient registration , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[79]  Pascal Fua,et al.  Human Shape and Motion Recovery Using Animation Models , 2000 .

[80]  Takeo Kanade,et al.  Shape and motion carving in 6D , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[81]  Paulo R. S. Mendonça,et al.  Camera Pose Estimation and Reconstruction from Image Profiles under Circular Motion , 2000, ECCV.

[82]  Jessica K. Hodgins,et al.  Automatic Joint Parameter Estimation from Magnetic Motion Capture Data , 2023, Graphics Interface.

[83]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[84]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[85]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[86]  David Edward DiFranco,et al.  Recovery of 3D articulated motion from 2D correspondences , 2000 .

[87]  Carlo Tomasi,et al.  Alpha estimation in natural images , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[88]  Evangelos Kokkevis,et al.  Skinning Characters using Surface Oriented Free-Form Deformations , 2000, Graphics Interface.

[89]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[90]  Ioannis A. Kakadiaris,et al.  Estimating anthropometry and pose from a single image , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[91]  Michael J. Black,et al.  A framework for modeling the appearance of 3D articulated figures , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[92]  Appearance-Based Virtual-View Generation for Fly Through in a Real Dynamic Scene , 2000, VisSym.

[93]  R. Zabih,et al.  Exact voxel occupancy with graph cuts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[94]  Mohan M. Trivedi,et al.  Articulated body posture estimation from multi-camera voxel data , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[95]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[96]  Roberto Cipolla,et al.  Real-time tracking of highly articulated structures in the presence of noisy measurements , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[97]  Wojciech Matusik,et al.  Polyhedral Visual Hulls for Real-Time Rendering , 2001, Rendering Techniques.

[98]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[99]  Aldo Laurentini,et al.  Interactive reconstruction of 3D objects from silhouettes , 2001 .

[100]  Takeo Kanade,et al.  A subspace approach to layer extraction , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[101]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[102]  James M. Rehg,et al.  Reconstruction of 3D figure motion from 2D correspondences , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[103]  Jean Ponce,et al.  On computing exact visual hulls of solids bounded by smooth surfaces , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[104]  Roberto Cipolla,et al.  Structure and motion from silhouettes , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[105]  Mohammed Yeasin,et al.  Automatic Acquisition and Initialization of Kinematic Models , 2001 .

[106]  David W. Jacobs,et al.  Judging Whether Multiple Silhouettes Can Come from the Same Object , 2001, IWVF.

[107]  C. Waggoner Combining Dynamic Simulation , High Dynamic Range Photography and Global Illumination , 2001 .

[108]  R. Cipolla,et al.  A probabilistic framework for space carving , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[109]  Takeo Kanade,et al.  A Characterization of Inherent Stereo Ambiguities , 2001, ICCV.

[110]  R. Plankers,et al.  Articulated soft objects for video-based body modeling , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[111]  Paulo R. S. Mendonça,et al.  Epipolar geometry from profiles under circular motion , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[112]  Cary B. Phillips,et al.  Multi-weight enveloping: least-squares approximation techniques for skin animation , 2002, SCA '02.

[113]  Pascal Fua,et al.  Markerless Full Body Shape and Motion Capture from Video Sequences , 2002 .

[114]  Yaron Caspi,et al.  Increasing Space-Time Resolution in Video , 2002, ECCV.

[115]  Takeo Kanade,et al.  Spatio-Temporal View Interpolation , 2002, Rendering Techniques.

[116]  Stefan Carlsson,et al.  Recognizing and Tracking Human Action , 2002, ECCV.

[117]  David J. Fleet,et al.  A Layered Motion Representation with Occlusion and Compact Spatial Support , 2002, ECCV.

[118]  Tal Hassner,et al.  What Does the Scene Look Like from a Scene Point? , 2002, ECCV.

[119]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[120]  Takeo Kanade,et al.  Visual hull alignment and refinement across time: a 3D reconstruction algorithm combining shape-from-silhouette with stereo , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[121]  Katsushi Ikeuchi,et al.  Illumination from Shadows , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[122]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[123]  Jovan Popovic,et al.  Continuous capture of skin deformation , 2003, ACM Trans. Graph..

[124]  Mohammed Yeasin,et al.  Automatic acquisition and initialization of articulated models , 2003, Machine Vision and Applications.

[125]  Mohan M. Trivedi,et al.  Human Body Model Acquisition and Tracking Using Voxel Data , 2003, International Journal of Computer Vision.

[126]  Aaron F. Bobick,et al.  Fast Lighting Independent Background Subtraction , 2004, International Journal of Computer Vision.

[127]  Stefan Carlsson,et al.  Uncalibrated Motion Capture Exploiting Articulated Structure Constraints , 2004, International Journal of Computer Vision.

[128]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[129]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[130]  Kiriakos N. Kutulakos,et al.  Plenoptic Image Editing , 2004, International Journal of Computer Vision.

[131]  Michal Irani,et al.  Computing occluding and transparent motions , 1994, International Journal of Computer Vision.

[132]  Zhengyou Zhang,et al.  Iterative point matching for registration of free-form curves and surfaces , 1994, International Journal of Computer Vision.

[133]  Takeo Kanade,et al.  Image-based spatio-temporal modeling and view interpolation of dynamic events , 2005, TOGS.

[134]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[135]  Anil K. Jain Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.