RGBD-Dog: Predicting Canine Pose from RGBD Sensors

The automatic extraction of animal 3D pose from images without markers is of interest in a range of scientific fields. Most work to date predicts animal pose from RGB images, based on 2D labelling of joint positions. However, due to the difficult nature of obtaining training data, no ground truth dataset of 3D animal motion is available to quantitatively evaluate these approaches. In addition, a lack of 3D animal pose data also makes it difficult to train 3D pose-prediction methods in a similar manner to the popular field of body-pose prediction. In our work, we focus on the problem of 3D canine pose estimation from RGBD images, recording a diverse range of dog breeds with several Microsoft Kinect v2s, simultaneously obtaining the 3D ground truth skeleton via a motion capture system. We generate a dataset of synthetic RGBD images from this data. A stacked hourglass network is trained to predict 3D joint locations, which is then constrained using prior models of shape and pose. We evaluate our model on both synthetic and real RGBD images and compare our results to previously published work fitting canine models to images. Finally, despite our training set consisting only of dog data, visual inspection implies that our network can produce good predictions for images of other quadrupeds – e.g. horses or cats – when their pose is similar to that contained in our training set.

[1]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[2]  Mikhail Kislin,et al.  Fast animal pose estimation using deep neural networks , 2018, Nature Methods.

[3]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew W. Fitzgibbon,et al.  Creatures great and SMAL: Recovering the shape and motion of animals from video , 2018, ACCV.

[5]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[6]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[7]  Kevin M. Cury,et al.  DeepLabCut: markerless pose estimation of user-defined body parts with deep learning , 2018, Nature Neuroscience.

[8]  Kyoung Mu Lee,et al.  V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[10]  Yichen Wei,et al.  Integral Human Pose Regression , 2017, ECCV.

[11]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, SIGGRAPH 2004.

[12]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[13]  Zhenhua Wang,et al.  Synthesizing Training Images for Boosting Human 3D Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[14]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, ACM Trans. Graph..

[15]  Michael J. Black,et al.  3D Menagerie: Modeling the 3D Shape and Pose of Animals , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Cewu Lu,et al.  Cross-Domain Adaptation for Animal Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[18]  Alexander M. Bronstein,et al.  Calculus of Nonrigid Surfaces for Geometry and Texture Manipulation , 2007, IEEE Transactions on Visualization and Computer Graphics.

[19]  Cordelia Schmid,et al.  MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[20]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[22]  Christian Theobalt,et al.  Real-Time Hand Tracking Under Occlusion from an Egocentric RGB-D Sensor , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[23]  Matthias Bethge,et al.  Using DeepLabCut for 3D markerless pose estimation across species and behaviors , 2018 .

[24]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Michael J. Black,et al.  Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images “In the Wild” , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[28]  B. D. Adelstein,et al.  Calculus of Nonrigid Surfaces for Geometry and Texture Manipulation , 2007 .

[29]  Iain D. Couzin,et al.  Fast and robust animal pose estimation , 2019 .

[30]  Andrew Gardiner The Dog Anatomy Workbook , 2014 .

[31]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[32]  Vincent Lepetit,et al.  DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[33]  Michael J. Black,et al.  Lions and Tigers and Bears: Capturing Non-rigid, 3D, Articulated Shape from Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Peter V. Gehler,et al.  A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[36]  Chi Xu,et al.  Mouse Pose Estimation From Depth Images , 2015, ArXiv.

[37]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[38]  Florian Echtler,et al.  libfreenect2: open-source library for Kinect v2 depth camera, release 0.1.1 , 2016 .

[39]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[40]  Alexander M. Bronstein,et al.  Efficient Computation of Isometry-Invariant Distances Between Surfaces , 2006, SIAM J. Sci. Comput..