A survey of human motion analysis using depth imagery

Analysis of human behaviour through visual information has been a highly active research topic in the computer vision community. This was previously achieved via images from a conventional camera, however recently depth sensors have made a new type of data available. This survey starts by explaining the advantages of depth imagery, then describes the new sensors that are available to obtain it. In particular, the Microsoft Kinect has made high-resolution real-time depth cheaply available. The main published research on the use of depth imagery for analysing human activity is reviewed. Much of the existing work focuses on body part detection and pose estimation. A growing research area addresses the recognition of human actions. The publicly available datasets that include depth imagery are listed, as are the software libraries that can acquire it from a sensor. This survey concludes by summarising the current state of work on this topic, and pointing out promising future research directions. For both researchers and practitioners who are familiar with this topic and those who are new to this field, the review will aid in the selection, and development, of algorithms using depth data.

[1]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[2]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[3]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[4]  3-D human body tracking from depth images using analysis by synthesis , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[5]  Anbumani Subramanian,et al.  Dynamic Hand Pose Recognition Using Depth Data , 2010, 2010 20th International Conference on Pattern Recognition.

[6]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Reinhard Koch,et al.  ToF-sensors: New dimensions for realism and interactivity , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[10]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[11]  Gérard G. Medioni,et al.  Human pose estimation from a single view point, real-time range sensor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[12]  Luc Van Gool,et al.  Combining RGB and ToF cameras for real-time 3D hand gesture interaction , 2011, WACV.

[13]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[14]  Yoshiaki Shirai,et al.  Recognition of polyhedrons with a range finder , 1971, IJCAI.

[15]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[17]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[18]  Yui Man Lui,et al.  A least squares regression framework on manifolds and its application to gesture recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[21]  Behzad Dariush,et al.  Controlled human pose estimation from depth image streams , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[23]  Honghai Liu,et al.  Advances in View-Invariant Human Motion Analysis: A Review , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  Samsu Sempena,et al.  Human action recognition using Dynamic Time Warping , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[25]  Rüdiger Dillmann,et al.  Fusion of 2d and 3d sensor data for articulated body tracking , 2009, Robotics Auton. Syst..

[26]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[27]  B. Jansen,et al.  3D human pose recognition for home monitoring of elderly , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[28]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[30]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Richard Bowden,et al.  Putting the pieces together: Connected Poselets for human pose estimation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[32]  Aaron Hertzmann,et al.  Learning 3D mesh segmentation and labeling , 2010, SIGGRAPH 2010.

[33]  Ja-Ling Wu,et al.  Range data acquisition using color structured lighting and stereo vision , 1997, Image Vis. Comput..

[34]  Nassir Navab,et al.  Manifold Learning for ToF-based Human Body Tracking and Activity Recognition , 2010, BMVC.

[35]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Gongzhu Hu,et al.  3-D Surface Solution Using Structured Light and Constraint Propagation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Joachim Hornegger,et al.  Gesture recognition with a Time-Of-Flight camera , 2008, Int. J. Intell. Syst. Technol. Appl..

[38]  Nassir Navab,et al.  Human skeleton tracking from depth data using geodesic distances and optical flow , 2012, Image Vis. Comput..

[39]  André Oosterlinck,et al.  Range Image Acquisition with a Single Binary-Encoded Light Pattern , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Emanuele Trucco,et al.  Introductory techniques for 3-D computer vision , 1998 .

[41]  Trevor Darrell,et al.  Constraining human body tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  Joaquim Salvi,et al.  Recent progress in coded structured light as a technique to solve the correspondence problem: a survey , 1998, Pattern Recognit..

[43]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Nassir Navab,et al.  Recognizing multiple human activities and tracking full-body pose in unconstrained environments , 2012, Pattern Recognit..

[45]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[46]  Seong-Whan Lee,et al.  View-independent human action recognition with Volume Motion Template on single stereo camera , 2010, Pattern Recognit. Lett..

[47]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[48]  Venu Govindaraju,et al.  A temporal Bayesian model for classifying, detecting and localizing activities in video sequences , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[49]  Ling Shao,et al.  One shot learning gesture recognition from RGBD images , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[50]  Nassir Navab,et al.  Estimating human 3D pose from Time-of-Flight images based on geodesic distances and optical flow , 2011, Face and Gesture 2011.

[51]  Md. Atiqur Rahman Ahad,et al.  Motion history image: its variants and applications , 2012, Machine Vision and Applications.

[52]  Peter M. Will,et al.  Grid Coding: A Preprocessing Technique for Robot and Machine Vision , 1971, IJCAI.

[53]  Reinhard Koch,et al.  Nonlinear Body Pose Estimation from Depth Images , 2005, DAGM-Symposium.

[54]  Sergio Escalera,et al.  Featureweighting in dynamic timewarping for gesture recognition in depth data , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[55]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Hossein Ragheb,et al.  MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[57]  Kikuo Fujimura,et al.  Constrained Optimization for Human Pose Estimation from Depth Sequences , 2007, ACCV.

[58]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[59]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[60]  J. Ferryman,et al.  Recognition of everyday domestic activities using a depth sensor , 2011 .

[61]  Hrvoje Benko,et al.  Using Depth-Sensing Camera to Enable Freehand Interactions On and Above the Interactive Surface , 2008 .

[62]  Stefan Müller,et al.  Hand Gesture Recognition with a Novel IR Time-of-Flight Range Camera-A Pilot Study , 2007, MIRAGE.

[63]  Lynne E. Parker,et al.  4-dimensional local spatio-temporal features for human activity recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64]  Patrick J. Flynn,et al.  Overview of the face recognition grand challenge , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[65]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[66]  Mark Everingham,et al.  Learning shape models for monocular human pose estimation from the Microsoft Xbox Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[67]  Naoufel Werghi,et al.  Recognition of human body posture from a cloud of 3D data points using wavelet transform coefficients , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[68]  Thomas B. Moeslund,et al.  Gesture Recognition using a Range Camera , 2007 .

[69]  Luca Iocchi,et al.  Human Posture Tracking and Classification through Stereo Vision and 3D Model Matching , 2008, EURASIP J. Image Video Process..

[70]  Kikuo Fujimura,et al.  A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences , 2010, Sensors.

[71]  Rasmus Larsen,et al.  TOF imaging in Smart room environments towards improved people tracking , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[72]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..