Creatures great and SMAL: Recovering the shape and motion of animals from video

We present a system to recover the 3D shape and motion of a wide variety of quadrupeds from video. The system comprises a machine learning front-end which predicts candidate 2D joint positions, a discrete optimization which finds kinematically plausible joint correspondences, and an energy minimization stage which fits a detailed 3D model to the image. In order to overcome the limited availability of motion capture training data from animals, and the difficulty of generating realistic synthetic training images, the system is designed to work on silhouette data. The joint candidate predictor is trained on synthetically generated silhouette images, and at test time, deep learning methods or standard video segmentation tools are used to extract silhouettes from real data. The system is tested on animal videos from several species, and shows accurate reconstructions of 3D shape and pose.

[1]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[2]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Alan L. Yuille,et al.  Joint Object and Part Segmentation Using Deep Learned Potentials , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[5]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[6]  Andrew W. Fitzgibbon,et al.  What Shape Are Dolphins? Building 3D Morphable Models from 2D Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Alan L. Yuille,et al.  Semantic part segmentation using compositional model combining shape and appearance , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Michael J. Black,et al.  3D Menagerie: Modeling the 3D Shape and Pose of Animals , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[10]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Kai Chen,et al.  Video Object Segmentation with Re-identification , 2017, ArXiv.

[13]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[14]  Andrew Zisserman,et al.  SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes , 2017, BMVC.

[15]  Yu Chen,et al.  Inferring 3D Shapes and Deformations from Single Views , 2010, ECCV.

[16]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[17]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[18]  Jürgen Bernard,et al.  FuryExplorer: visual-interactive exploration of horse motion capture data , 2015, Electronic Imaging.

[19]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[24]  W. Walthen-Dunn A Transformation for Extracting New De scriptors of Shape ' , in , 2017 .

[25]  Andrew W. Fitzgibbon,et al.  Learning an efficient model of hand shape variation from depth images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Thomas Brox,et al.  Lucid Data Dreaming for Object Tracking , 2017, ArXiv.

[27]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[28]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[29]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[31]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[32]  Michael J. Black,et al.  Lions and Tigers and Bears: Capturing Non-rigid, 3D, Articulated Shape from Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Stephen P. Boyd,et al.  General Heuristics for Nonconvex Quadratically Constrained Quadratic Programming , 2017, 1703.07870.

[34]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Manolis I. A. Lourakis,et al.  Is Levenberg-Marquardt the most efficient optimization algorithm for implementing bundle adjustment? , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Marie-Paule Cani,et al.  Animal gaits from video , 2004, SCA '04.

[37]  Kevin M. Cury,et al.  DeepLabCut: markerless pose estimation of user-defined body parts with deep learning , 2018, Nature Neuroscience.

[38]  Hans-Peter Seidel,et al.  Animated 3D Creatures from Single-view Video by Skeletal Sketching , 2016, Graphics Interface.

[39]  Ignas Budvytis,et al.  Indirect deep structured learning for 3D human body shape and pose prediction , 2017, BMVC.

[40]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[41]  Georgios Tzimiropoulos,et al.  Human Pose Estimation via Convolutional Part Heatmap Regression , 2016, ECCV.

[42]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.