Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking

Noninvasive behavioral tracking of animals is crucial for many scientific investigations. Recent transfer learning approaches for behavioral tracking have considerably advanced the state of the art. Typically these methods treat each video frame and each object to be tracked independently. In this work, we improve on these methods (particularly in the regime of few training labels) by leveraging the rich spatiotemporal structures pervasive in behavioral video — specifically, the spatial statistics imposed by physical constraints (e.g., paw to elbow distance), and the temporal statistics imposed by smoothness from frame to frame. We propose a probabilistic graphical model built on top of deep neural networks, Deep Graph Pose (DGP), to leverage these useful spatial and temporal constraints, and develop an efficient structured variational approach to perform inference in this model. The resulting semi-supervised model exploits both labeled and unlabeled frames to achieve significantly more accurate and robust tracking while requiring users to label fewer training frames. In turn, these tracking improvements enhance performance on downstream applications, including robust unsupervised segmentation of behavioral “syllables,” and estimation of interpretable “disentangled” low-dimensional representations of the full behavioral video. Open source code is available at https://github.com/paninski-lab/deepgraphpose.

[1]  Jacob M. Graving,et al.  DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning , 2019, bioRxiv.

[2]  Hyun Soo Park,et al.  Multiview Supervision By Registration , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[4]  Mackenzie W. Mathis,et al.  Somatosensory Cortex Plays an Essential Role in Forelimb Motor Adaptation in Mice , 2017, Neuron.

[5]  John P. Cunningham,et al.  Neuroscience Cloud Analysis As a Service , 2020, bioRxiv.

[6]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Frank Guerin,et al.  Latent Space Factorisation and Manipulation via Matrix Subspace Projection , 2019, ICML.

[8]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[9]  Martin K. Schwarz,et al.  DeepLabStream: Closing the loop using deep learning-based markerless, real-time posture detection , 2019, bioRxiv.

[10]  Nasser Kehtarnavaz,et al.  Deep Learning-based Human Pose Estimation: A Survey , 2020, ACM Comput. Surv..

[11]  Joost van de Weijer,et al.  Active Learning for Deep Detection Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Kevin M. Cury,et al.  DeepLabCut: markerless pose estimation of user-defined body parts with deep learning , 2018, Nature Neuroscience.

[13]  Norimichi Ukita,et al.  Semi- and weakly-supervised human pose estimation , 2018, Comput. Vis. Image Underst..

[14]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[15]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[16]  Scott W. Linderman,et al.  BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos , 2019, NeurIPS.

[17]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[18]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[19]  Jianbo Shi,et al.  Learning Temporal Pose Estimation from Sparsely-Labeled Videos , 2019, NeurIPS.

[20]  James J. Little,et al.  Exploiting Temporal Information for 3D Human Pose Estimation , 2017, ECCV.

[21]  Alexander Mathis,et al.  Deep learning tools for the measurement of animal behavior in neuroscience , 2019, Current Opinion in Neurobiology.

[22]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[23]  Anil A. Bharath,et al.  Conditional Autoencoders with Adversarial Information Factorization , 2017, ArXiv.

[24]  Tycho M. Hoogland,et al.  OptiFlex: video-based animal pose estimation using deep learning enhanced by optical flow , 2020, bioRxiv.

[25]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[26]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[28]  Xiaogang Wang,et al.  Structured Feature Learning for Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hui Su,et al.  Deep Structured Prediction for Facial Landmark Detection , 2019, NeurIPS.

[30]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[31]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[32]  Ryan P. Adams,et al.  Mapping Sub-Second Structure in Mouse Behavior , 2015, Neuron.

[33]  Michael Unser,et al.  FlyLimbTracker: An active contour based approach for leg segment tracking in unmarked, freely behaving Drosophila , 2016, bioRxiv.

[34]  Fanny Cazettes,et al.  Standardized and reproducible measurement of decision-making in mice , 2020, bioRxiv.

[35]  Steven Schwarcz,et al.  3D Human Pose Estimation from Deep Multi-View 2D Pose , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[36]  Burr Settles,et al.  From Theories to Queries: Active Learning in Practice , 2011 .

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Nanning Zheng,et al.  A Limb-Based Graphical Model for Human Pose Estimation , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[39]  J. Varah,et al.  On the Solution of Block-Tridiagonal Systems Arising from Certain Finite-Difference Equations* , 1972 .

[40]  Matthew T. Kaufman,et al.  Single-trial neural dynamics are dominated by richly varied movements , 2019, Nature Neuroscience.

[41]  Benjamin F. Grewe,et al.  Deep learning based behavioral analysis enables high precision rodent tracking and is capable of outperforming commercial solutions , 2020, bioRxiv.

[42]  Andrew McCallum,et al.  Active Learning by Labeling Features , 2009, EMNLP.

[43]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[44]  Haoyu Ma,et al.  Adaptive Graphical Model Network for 2D Handpose Estimation , 2019, BMVC.

[45]  Luc Van Gool,et al.  Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Richard S. Zemel,et al.  Learning Latent Subspaces in Variational Autoencoders , 2018, NeurIPS.

[47]  Adam J. Calhoun,et al.  Quantifying behavior to solve sensorimotor transformations: advances from worms and flies , 2017, Current Opinion in Neurobiology.

[48]  Luca G. Molinari,et al.  DETERMINANTS OF BLOCK TRIDIAGONAL MATRICES , 2007, 0712.0681.

[49]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[50]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[51]  Pascal Fua,et al.  DeepFly3D: A deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila , 2019, bioRxiv.

[52]  Nicholas A. Steinmetz,et al.  Distributed coding of choice, action, and engagement across the mouse brain , 2019, Nature.

[53]  Pascal Fua,et al.  DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila , 2019, eLife.

[54]  Mikhail Kislin,et al.  Fast animal pose estimation using deep neural networks , 2018, Nature Methods.

[55]  Seng Bum Michael Yoo,et al.  OpenMonkeyStudio: Automated Markerless Pose Estimation in Freely Moving Macaques , 2020, bioRxiv.

[56]  Mohammad Emtiyaz Khan,et al.  Variational Message Passing with Structured Inference Networks , 2018, ICLR.

[57]  Nicholas A. Steinmetz,et al.  Spontaneous behaviors drive multidimensional, brainwide activity , 2019, Science.

[58]  Matthew G. Reuter,et al.  An efficient, block-by-block algorithm for inverting a block tridiagonal, nearly block Toeplitz matrix , 2012 .