Shape from Selfies: Human Body Shape Estimation Using CCA Regression Forests

In this work, we revise the problem of human body shape estimation from monocular imagery. Starting from a statistical human shape model that describes a body shape with shape parameters, we describe a novel approach to automatically estimate these parameters from a single input shape silhouette using semi-supervised learning. By utilizing silhouette features that encode local and global properties robust to noise, pose and view changes, and projecting them to lower dimensional spaces obtained through multi-view learning with canonical correlation analysis, we show how regression forests can be used to compute an accurate mapping from the silhouette to the shape parameter space. This results in a very fast, robust and automatic system under mild self-occlusion assumptions. We extensively evaluate our method on thousands of synthetic and real data and compare it to the state-of-art approaches that operate under more restrictive assumptions.

[1]  Jochen Lang,et al.  Estimation of human body shape and posture under clothing , 2013, Comput. Vis. Image Underst..

[2]  Daniel Cremers,et al.  Fast Matching of Planar Shapes in Sub-cubic Runtime , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Roberto Cipolla,et al.  Learning shape priors for single view reconstruction , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[4]  Yu Chen,et al.  Inferring 3D Shapes and Deformations from Single Views , 2010, ECCV.

[5]  Kathleen M. Robinette,et al.  The CAESAR project: a 3-D surface anthropometry survey , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[6]  Ruigang Yang,et al.  Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[8]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH 2005.

[11]  Hans-Peter Seidel,et al.  Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.

[12]  D. Cohen-Or,et al.  Parametric reshaping of human bodies in images , 2010, ACM Trans. Graph..

[13]  Hazem Wannous,et al.  Extremal human curves: A new human body shape and pose descriptor , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[14]  Janaina Mourão Miranda,et al.  Unsupervised analysis of fMRI data using kernel canonical correlation , 2007, NeuroImage.

[15]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[16]  Ioannis A. Kakadiaris,et al.  Three-Dimensional Human Body Model Acquisition from Multiple Views , 1998, International Journal of Computer Vision.

[17]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Jan Kautz,et al.  Video-based characters: creating new human performances from a multi-view video database , 2011, SIGGRAPH 2011.

[19]  Haibin Ling,et al.  Shape Classification Using the Inner-Distance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Sehoon Ha,et al.  Iterative Training of Dynamic Skills Inspired by Human Coaching Techniques , 2014, ACM Trans. Graph..

[21]  Yu Guo,et al.  Deformable model for estimating clothed and naked human shapes from a single image , 2013, The Visual Computer.

[22]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[24]  Daniel Cohen-Or,et al.  Consistent mesh partitioning and skeletonisation using the shape diameter function , 2008, The Visual Computer.

[25]  Long Quan,et al.  Image deblurring with blurred/noisy image pairs , 2007, SIGGRAPH 2007.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Michael J. Black,et al.  Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[28]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Bernt Schiele,et al.  Building statistical shape spaces for 3D human modeling , 2015, Pattern Recognit..

[30]  Adrian Hilton,et al.  Shape and Pose Space Deformation for Subject Specific Animation , 2013, 2013 International Conference on 3D Vision.

[31]  Daniel Cremers,et al.  Efficient planar graph cuts with applications in Computer Vision , 2009, CVPR.

[32]  Hans-Peter Seidel,et al.  MovieReshape: tracking and reshaping of humans in videos , 2010, ACM Trans. Graph..

[33]  Yu Chen,et al.  Silhouette-based object phenotype recognition using 3D shape priors , 2011, 2011 International Conference on Computer Vision.

[34]  J. Kautz 4D Video Textures for Interactive Character Appearance , 2013 .

[35]  Sebastian Thrun,et al.  Video-based reconstruction of animatable human characters , 2010, ACM Trans. Graph..

[36]  Björn Stenger,et al.  Human Body Shape Estimation Using a Multi-resolution Manifold Forest , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  A. Murat Tekalp,et al.  Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.

[38]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[39]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[40]  Joachim M. Buhmann,et al.  Correlated random features for fast semi-supervised learning , 2013, NIPS.

[41]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[42]  Hans-Peter Seidel,et al.  Multilinear pose and body shape estimation of dressed subjects from image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Daniel Cremers,et al.  Efficient Globally Optimal 2D-to-3D Deformable Shape Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Marcus A. Magnor,et al.  Garment Replacement in Monocular Video Sequences , 2014, ACM Trans. Graph..

[45]  Adrian Hilton,et al.  A Layered Model of Human Body and Garment Deformation , 2014, 2014 2nd International Conference on 3D Vision.

[46]  Ilya Baran,et al.  Automatic rigging and animation of 3D characters , 2007, SIGGRAPH 2007.

[47]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[48]  Michael J. Black,et al.  The Naked Truth: Estimating Body Shape Under Clothing , 2008, ECCV.

[49]  Michael J. Black,et al.  DRAPE: DRessing Any PErson , 2012, ACM Trans. Graph..

[50]  Derek Nowrouzezahrai,et al.  Learning hatching for pen-and-ink illustration of surfaces , 2012, TOGS.

[51]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[52]  Mohan M. Trivedi,et al.  Human Body Model Acquisition and Tracking Using Voxel Data , 2003, International Journal of Computer Vision.

[53]  Won-Sook Lee,et al.  A Data-driven Approach to Human-body Cloning Using a Segmented Body Database , 2007, 15th Pacific Conference on Computer Graphics and Applications (PG'07).

[54]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[55]  Marc Pollefeys,et al.  Multi-object shape estimation and tracking from silhouette cues , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Adrian Hilton,et al.  Video-based character animation , 2005, SCA '05.

[57]  Ruigang Yang,et al.  Semantic Parametric Reshaping of Human Body Models , 2014, 2014 2nd International Conference on 3D Vision.

[58]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Chang Shu,et al.  Three-dimensional human shape inference from silhouettes: reconstruction and validation , 2011, Machine Vision and Applications.