论文信息 - Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras

Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras

We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generative motion tracking algorithm through a combined pose optimization energy. The discriminative part-based pose detection method, implemented using Convolutional Networks (ConvNet), estimates unary potentials for each joint of a kinematic skeleton model. These unary potentials are used to probabilistically extract pose constraints for tracking by using weighted sampling from a pose posterior guided by the model. In the final energy, these constraints are combined with an appearance-based model-to-image similarity term. Poses can be computed very efficiently using iterative local optimization, as ConvNet detection is fast, and our formulation yields a combined pose estimation energy with analytic derivatives. In combination, this enables to track full articulated joint angles at state-of-the-art accuracy and temporal stability with a very low number of cameras.

[1] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Jitendra Malik,et al. Articulated Pose Estimation Using Discriminative Armlet Classifiers , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Ahmed M. Elgammal,et al. Coupled Visual and Kinematic Manifold Models for Tracking , 2010, International Journal of Computer Vision.

[4] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5] Hans-Peter Seidel,et al. Outdoor human motion capture using inverse kinematics and von mises-fisher sampling , 2011, 2011 International Conference on Computer Vision.

[6] Hans-Peter Seidel,et al. Drift-free tracking of rigid and articulated objects , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Kwang In Kim,et al. Outdoor Human Motion Capture by Simultaneous Optimization of Pose and Camera Parameters , 2015, Comput. Graph. Forum.

[8] Peter V. Gehler,et al. Strong Appearance and Expressive Spatial Models for Human Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[9] Hans-Peter Seidel,et al. A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[10] Jonathan Tompson,et al. Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[11] Luc Van Gool,et al. Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Yaser Sheikh,et al. Motion capture from body-mounted cameras , 2011, ACM Trans. Graph..

[13] Alan L. Yuille,et al. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[14] Hans-Peter Seidel,et al. Spatio-temporal motion tracking with unsynchronized cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Stefan Carlsson,et al. 3D Pictorial Structures for Multiple View Articulated Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[18] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Cristian Sminchisescu,et al. Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[20] Hans-Peter Seidel,et al. Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[21] Bodo Rosenhahn,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence Combined Region-and Motion-based 3d Tracking of Rigid and Articulated Objects , 2022 .

[22] Holger Wendland,et al. Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree , 1995, Adv. Comput. Math..

[23] Hans-Peter Seidel,et al. Markerless Motion Capture with unsynchronized moving cameras , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Bernt Schiele,et al. Multi-view Pictorial Structures for 3D Human Pose Estimation , 2013, BMVC.

[25] Jonathan Tompson,et al. MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[26] Vittorio Ferrari,et al. Better Appearance Models for Pictorial Structures , 2009, BMVC.

[27] Hans-Peter Seidel,et al. Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[28] Luc Van Gool,et al. Smart particle filtering for high-dimensional tracking , 2007, Comput. Vis. Image Underst..

[29] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[30] Antti Oulasvirta,et al. Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[31] Qionghai Dai,et al. Performance Capture of Interacting Characters with Handheld Kinects , 2012, ECCV.

[32] Michael J. Black,et al. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[33] Michael Isard,et al. Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[34] Ronald Poppe,et al. Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[35] Rui Li,et al. 3D Human Motion Tracking with a Coordinated Mixture of Factor Analyzers , 2009, International Journal of Computer Vision.

[36] Bernt Schiele,et al. Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Adrian Hilton,et al. A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[38] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[39] Juergen Gall,et al. Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[40] Peter V. Gehler,et al. Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[42] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[43] Ben Taskar,et al. MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[45] Nassir Navab,et al. 3D Pictorial Structures for Multiple Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.