Robust solving of optical motion capture data by denoising

Raw optical motion capture data often includes errors such as occluded markers, mislabeled markers, and high frequency noise or jitter. Typically these errors must be fixed by hand - an extremely time-consuming and tedious task. Due to this, there is a large demand for tools or techniques which can alleviate this burden. In this research we present a tool that sidesteps this problem, and produces joint transforms directly from raw marker data (a task commonly called "solving") in a way that is extremely robust to errors in the input data using the machine learning technique of denoising. Starting with a set of marker configurations, and a large database of skeletal motion data such as the CMU motion capture database [CMU 2013b], we synthetically reconstruct marker locations using linear blend skinning and apply a unique noise function for corrupting this marker data - randomly removing and shifting markers to dynamically produce billions of examples of poses with errors similar to those found in real motion capture data. We then train a deep denoising feed-forward neural network to learn a mapping from this corrupted marker data to the corresponding transforms of the joints. Once trained, our neural network can be used as a replacement for the solving part of the motion capture pipeline, and, as it is very robust to errors, it completely removes the need for any manual clean-up of data. Our system is accurate enough to be used in production, generally achieving precision to within a few millimeters, while additionally being extremely fast to compute with low memory requirements.

[1]  Yueting Zhuang,et al.  Sparse motion bases selection for human motion denoising , 2015, Signal Process..

[2]  Christos Faloutsos,et al.  BoLeRO: a principled technique for including bone length constraints in motion capture occlusion filling , 2010, SCA '10.

[3]  Arno Zinke,et al.  Data-Driven Completion of Motion Capture Data , 2011, VRIPHYS.

[4]  Hengyuan Hu,et al.  Deep Restricted Boltzmann Networks , 2016, ArXiv.

[5]  Daniel Cohen-Or,et al.  Self‐similarity Analysis for Motion Capture Cleaning , 2018, Comput. Graph. Forum.

[6]  Joan Lasenby,et al.  Estimating missing marker positions using low dimensional Kalman smoothing. , 2016, Journal of biomechanics.

[7]  Jitendra Malik,et al.  Recurrent Network Models for Kinematic Tracking , 2015, ArXiv.

[8]  Mark Meyer,et al.  Kernel-predicting convolutional networks for denoising Monte Carlo renderings , 2017, ACM Trans. Graph..

[9]  Ganapathy Krishnamurthi,et al.  Semi-supervised Learning using Denoising Autoencoders for Brain Lesion Detection and Segmentation , 2016 .

[10]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[11]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Andreas Aristidou,et al.  Real-time marker prediction and CoR estimation in optical motion capture , 2011, The Visual Computer.

[13]  Siddhartha Chaudhuri,et al.  A Deep Recurrent Framework for Cleaning Motion Capture Data , 2017, ArXiv.

[14]  David Reitter,et al.  Online Semi-Supervised Learning with Deep Hybrid Boltzmann Machines and Denoising Autoencoders , 2015, ArXiv.

[15]  Jian J. Zhang,et al.  Automatic Estimation of Skeletal Motion from Optical Motion Capture Data , 2008, MIG.

[16]  Xuelong Li,et al.  Mining Spatial-Temporal Patterns and Structural Sparsity for Human Motion Data Denoising , 2015, IEEE Transactions on Cybernetics.

[17]  S. Buss Introduction to Inverse Kinematics with Jacobian Transpose , Pseudoinverse and Damped Least Squares methods , 2004 .

[18]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[19]  Leif Kobbelt,et al.  Self-calibrating optical motion tracking for articulated bodies , 2005, IEEE Proceedings. VR 2005. Virtual Reality, 2005..

[20]  Klaus Dorfmüller-Ulhaas Robust Optical User Motion Tracking Using a Kalman Filter , 2005 .

[21]  Yueting Zhuang,et al.  Exploiting temporal stability and low-rank structure for motion capture data refinement , 2014, Inf. Sci..

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jessica K. Hodgins,et al.  Capturing and animating skin deformation in human motion , 2006, SIGGRAPH '06.

[24]  Minh N. Do,et al.  Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Taku Komura,et al.  Learning motion manifolds with convolutional autoencoders , 2015, SIGGRAPH Asia Technical Briefs.

[26]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[27]  Zhen Cui,et al.  Automatic motion capture data denoising via filtered subspace clustering and low rank matrix approximation , 2014, Signal Process..

[28]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[29]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[30]  Hans-Peter Seidel,et al.  Motion reconstruction using sparse accelerometer data , 2011, TOGS.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Guodong Liu,et al.  Estimation of missing markers in human motion capture , 2006, The Visual Computer.

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[35]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[36]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[37]  Joan Lasenby,et al.  Multiple Hypothesis Tracking for Automatic Optical Motion Capture , 2002, ECCV.

[38]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[39]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[40]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[41]  Victor B. Zordan,et al.  Mapping optical motion capture data to skeletal motion using a physical model , 2003, SCA '03.

[42]  Jirí Zára,et al.  Geometric skinning with approximate dual quaternion blending , 2008, TOGS.

[43]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[44]  Yaser Sheikh,et al.  Bilinear spatiotemporal basis models , 2012, TOGS.

[45]  Timo Aila,et al.  Interactive reconstruction of Monte Carlo image sequences using a recurrent denoising autoencoder , 2017, ACM Trans. Graph..