Human Motion Tracking by Multiple RGBD Cameras

The advent of low-cost depth cameras, such as the Microsoft Kinect in the consumer market, has made many indoor applications and games based on motion tracking available to the everyday user. However, it is a large challenge to track human motion via such a camera because of its low-quality images, missing depth values, and noise. In this paper, we propose a novel human motion capture method based on a cooperative structure of multiple low-cost RGBD cameras, which can effectively avoid these problems. This structure can also manage the problem of body occlusions that appears when a single camera is used. Moreover, the whole process does not require training data, which makes this approach easily deployed and reduces operation time. We use the color image, depth image, and point cloud acquired in each view as the data source, and an initial pose is extracted in our optimization framework by aligning multiple point clouds from different cameras. The pose is dynamically updated by combining a filtering approach with a Markov model to estimate new poses in video streams. To verify the efficiency and robustness of our approach, we capture a wide variety of human actions via three cameras in indoor scenes and compare the tracking results of the proposed method to those of the current state-of-the-art methods. Moreover, our system is tested on more complex situations, in which multiple humans move within a scene, possibly occluding each other to some extent. The actions of multiple humans are tracked simultaneously, which would assist group behavior analysis.

[1]  Yuan F. Zheng,et al.  Object Tracking in Structured Environments for Video Surveillance Applications , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[3]  Yang Gao,et al.  Multi-layered gesture recognition with Kinect , 2015, J. Mach. Learn. Res..

[4]  Xiao Wang,et al.  Human-Centered Saliency Detection , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Peter H. N. de With,et al.  Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment , 2012, IEEE Transactions on Consumer Electronics.

[6]  Feng Wu,et al.  Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Yue Gao,et al.  Symbiotic Tracker Ensemble Toward A Unified Tracking Framework , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Junwei Han,et al.  Multi-class geospatial object detection and geographic image classification based on collection of part detectors , 2014 .

[9]  Yun Fu,et al.  Human Pose Regression Through Multiview Visual Fusion , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Angelos Barmpoutis,et al.  Tensor Body: Real-Time Reconstruction of the Human Body and Avatar Synthesis From RGB-D , 2013, IEEE Transactions on Cybernetics.

[11]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[12]  Seah Hock Soon,et al.  3D Human motion tracking by exemplar-based conditional particle filter , 2015, Signal Process..

[13]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[14]  Shengping Zhang,et al.  Sparse coding based visual tracking: Review and experimental comparison , 2013, Pattern Recognit..

[15]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xuelong Li,et al.  Robust Visual Tracking Using Structurally Random Projection and Weighted Least Squares , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Hans-Peter Seidel,et al.  Real-Time Body Tracking with One Depth Camera and Inertial Sensors , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Jinxiang Chai,et al.  Accurate realtime full-body motion capture using a single depth camera , 2012, ACM Trans. Graph..

[19]  Lei Guo,et al.  An Object-Oriented Visual Saliency Detection Framework Based on Sparse Coding Representations , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Lei Guo,et al.  Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[21]  Rongrong Ji,et al.  Bounding Multiple Gaussians Uncertainty with Application to Object Tracking , 2016, International Journal of Computer Vision.

[22]  Huchuan Lu,et al.  Robust Visual Tracking via Multiple Kernel Boosting With Affinity Constraints , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Thomas Martinetz,et al.  Self-Organizing Maps for Pose Estimation with a Time-of-Flight Camera , 2009, Dyn3D.

[24]  Ling Shao,et al.  Computer Vision and Machine Learning with RGB-D Sensors , 2014, Advances in Computer Vision and Pattern Recognition.

[25]  Tomás Pajdla,et al.  3D with Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[26]  Junwei Han,et al.  Template Deformation-Based 3-D Reconstruction of Full Human Body Scans From Low-Cost Depth Cameras. , 2017, IEEE transactions on cybernetics.

[27]  Jun Yu,et al.  Image-Based 3D Human Pose Recovery with Locality Sensitive Sparse Retrieval , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[28]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Vincent Lepetit,et al.  3-D Head Tracking via Invariant Keypoint Learning , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[31]  Jun Zhang,et al.  Adaptive NormalHedge for robust visual tracking , 2015, Signal Process..

[32]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[33]  Jean-Yves Bouguet,et al.  Camera calibration toolbox for matlab , 2001 .

[34]  Yang Song,et al.  Towards detection of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[35]  Antoni B. Chan,et al.  Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[36]  Noel E. O'Connor,et al.  Evaluating a dancer's performance using kinect-based skeleton tracking , 2011, ACM Multimedia.

[37]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[38]  Yue Gao,et al.  Depth Error Elimination for RGB-D Cameras , 2015, ACM Trans. Intell. Syst. Technol..

[39]  Rüdiger Dillmann,et al.  Sensor fusion for 3D human body tracking with an articulated 3D body model , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[40]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[41]  Bingbing Ni,et al.  A Hybrid Framework for 3-D Human Motion Tracking , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Junwei Han,et al.  3D real human reconstruction via multiple low-cost depth cameras , 2015, Signal Process..