Real-Time Human Motion Capture with Multiple Depth Cameras

Commonly used human motion capture systems require intrusive attachment of markers that are visually tracked with multiple cameras. In this work we present an efficient and inexpensive solution to markerless motion capture using only a few Kinect sensors. Unlike the previous work on 3d pose estimation using a single depth camera, we relax constraints on the camera location and do not assume a co-operative user. We apply recent image segmentation techniques to depth images and use curriculum learning to train our system on purely synthetic data. Our method accurately localizes body parts without requiring an explicit shape model. The body joint locations are then recovered by combining evidence from multiple views in real-time. We also introduce a dataset of ~6 million synthetic depth frames for pose estimation from multiple cameras and exceed state-of-the-art results on the Berkeley MHAD dataset.

[1]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[3]  Frank P. Ferrie,et al.  Towards 3D human posture estimation using multiple kinects despite self-contacts , 2015, 2015 14th IAPR International Conference on Machine Vision Applications (MVA).

[4]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[5]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[7]  James J. Little,et al.  Unlabelled 3D Motion Examples Improve Cross-View Action Recognition , 2014, BMVC.

[8]  Ruigang Yang,et al.  Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ruigang Yang,et al.  Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[14]  Deva Ramanan,et al.  Articulated pose estimation with tiny synthetic videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  James J. Little,et al.  3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  C. Karen Liu,et al.  Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture , 2014, ACM Trans. Graph..

[17]  Ho Yub Jung,et al.  Random tree walk toward instantaneous 3D human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[19]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[20]  Antonis A. Argyros,et al.  Tracking the articulated motion of the human body with two RGBD cameras , 2014, Machine Vision and Applications.

[21]  Deva Ramanan,et al.  First-person pose recognition using egocentric workspaces , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Hans-Peter Seidel,et al.  Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.

[23]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).