Human activity recognition from 3D data: A review

Abstract Human activity recognition has been an important area of computer vision research since the 1980s. Various approaches have been proposed with a great portion of them addressing this issue via conventional cameras. The past decade has witnessed a rapid development of 3D data acquisition techniques. This paper summarizes the major techniques in human activity recognition from 3D data with a focus on techniques that use depth data. Broad categories of algorithms are identified based upon the use of different features. The pros and cons of the algorithms in each category are analyzed and the possible direction of future research is indicated.

[1]  Dieter Fox,et al.  Fine-grained kitchen activity recognition using RGB-D , 2012, UbiComp.

[2]  Gioia Ballin,et al.  Human Action Recognition from RGB-D Frames Based on Real-Time 3D Optical Flow Estimation , 2012, BICA.

[3]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[4]  Joseph J. LaViola,et al.  Measuring and reducing observational latency when recognizing actions , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[6]  Qi Tian,et al.  Human Daily Action Analysis with Multi-view and Color-Depth Data , 2012, ECCV Workshops.

[7]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[8]  William J. Christmas,et al.  Structural Matching in Computer Vision Using Probabilistic Relaxation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Agnes Swadzba,et al.  Tracking objects in 6D for reconstructing static scenes , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[12]  Stefano Soatto,et al.  Structure from Motion Causally Integrated Over Time , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Hailin Jin,et al.  A Variational Approach to Shape from Defocus , 2002, ECCV.

[14]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[15]  Hironobu Fujiyoshi,et al.  Real-time human motion analysis by image skeletonization , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[16]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[18]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[19]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[20]  Angeline M. Loh The recovery of 3-D structure using visual texture patterns , 2006 .

[21]  Lu Yang,et al.  Combing RGB and Depth Map Features for human activity recognition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[22]  Kin Hong Wong,et al.  Robust 3-D Motion Tracking From Stereo Images: A Model-Less Method , 2008, IEEE Transactions on Instrumentation and Measurement.

[23]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[24]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[25]  Jake K. Aggarwal,et al.  Model-based object recognition in dense-range images—a review , 1993, CSUR.

[26]  J. Gibson The perception of the visual world , 1951 .

[27]  Pascal Fua,et al.  Making Action Recognition Robust to Occlusions and Viewpoint Changes , 2010, ECCV.

[28]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[30]  Daniel Cremers,et al.  Stereoscopic Scene Flow Computation for 3D Motion Understanding , 2011, International Journal of Computer Vision.

[31]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[32]  Tanja Schultz,et al.  Selecting relevant features for human motion recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[33]  Tae-Seong Kim,et al.  Human Activity Recognition Using Body Joint‐Angle Features and Hidden Markov Model , 2011 .

[34]  Luc Van Gool,et al.  A Hough transform-based voting framework for action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Pascal Fua,et al.  3D tracking for gait characterization and recognition , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[36]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[37]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[38]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[39]  Jake K. Aggarwal,et al.  Structure from stereo-a review , 1989, IEEE Trans. Syst. Man Cybern..

[40]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[41]  Sergio Escalera,et al.  BoVDW: Bag-of-Visual-and-Depth-Words for gesture recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[42]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[43]  Jake K. Aggarwal,et al.  Surface Correspondence and Motion Computation from a Pair of Range Images , 1996, Comput. Vis. Image Underst..

[44]  Maria Petrou,et al.  Reconstruction of 3-D horizons from 3-D seismic datasets , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[45]  Gary R. Bradski,et al.  Stereo based gesture recognition invariant to 3D pose and lighting , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[46]  Alejandro Linares-Barranco,et al.  Stereo Matching: From the Basis to Neuromorphic Engineering , 2012 .

[47]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[48]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[49]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[50]  Hideo Saito,et al.  Application of genetic algorithms to stereo matching of images , 1995, Pattern Recognit. Lett..

[51]  Yang Yang,et al.  Generalized Model‐Based Human Motion Recognition with Body Partition Index Maps , 2012, Comput. Graph. Forum.

[52]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[53]  Luc Van Gool,et al.  Does Human Action Recognition Benefit from Pose Estimation? , 2011, BMVC.

[54]  Seong-Whan Lee,et al.  View-independent human action recognition with Volume Motion Template on single stereo camera , 2010, Pattern Recognit. Lett..

[55]  Hong Wei,et al.  A survey of human motion analysis using depth imagery , 2013, Pattern Recognit. Lett..

[56]  Olivier D. Faugeras,et al.  Unifying Approaches and Removing Unrealistic Assumptions in Shape from Shading: Mathematics Can Help , 2004, ECCV.

[57]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[58]  Dimitrios Makris,et al.  G3D: A gaming action dataset and real time action recognition evaluation framework , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[59]  Martin Herman,et al.  Head tracking using stereo , 2002, Machine Vision and Applications.

[60]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[61]  Alexander Zelinsky,et al.  An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[62]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[63]  Michael Harville,et al.  Fast, integrated person tracking and activity recognition with plan-view templates from a single stereo camera , 2004, CVPR 2004.

[64]  Lynne E. Parker,et al.  4-dimensional local spatio-temporal features for human activity recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[65]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[66]  Rafael Muñoz-Salinas,et al.  People detection and tracking using stereo vision and color , 2007, Image Vis. Comput..

[67]  Takeo Kanade,et al.  A stereo machine for video-rate dense depth mapping and its new applications , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  Ioannis Pitas,et al.  3D Human Action Recognition for Multi-view Camera Systems , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[69]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[70]  Gordon Erlebacher,et al.  A novel technique for face recognition using range imaging , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[71]  Ling Shao,et al.  One shot learning gesture recognition from RGBD images , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[72]  Isaac Cohen,et al.  Inference of human postures by classification of 3D human body shape , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[73]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Jun-Wei Hsieh,et al.  Occluded human action analysis using dynamic manifold model , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[75]  Gregory D. Hager,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.

[76]  Olivier D. Faugeras,et al.  Shape from shading: a well-posed problem? , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[77]  Dana Kulic,et al.  Incremental Learning, Clustering and Hierarchy Formation of Whole Body Motion Patterns using Adaptive Hidden Markov Chains , 2008, Int. J. Robotics Res..

[78]  G. Johansson Visual motion perception. , 1975, Scientific American.

[79]  P. Fihl,et al.  View-invariant gesture recognition using 3D optical flow and harmonic motion context , 2010, Comput. Vis. Image Underst..

[80]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[81]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[82]  Takeo Kanade,et al.  Three-dimensional scene flow , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[83]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[84]  Jake K. Aggarwal,et al.  Estimation of motion from a pair of range images: A review , 1991, CVGIP Image Underst..

[85]  Michael G. Strintzis,et al.  A gesture recognition system using 3D data , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[86]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[87]  Josef Kittler,et al.  Error Guided Design of a 3D Vision System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[88]  Radu Horaud,et al.  Scene flow estimation by growing correspondence seeds , 2011, CVPR 2011.

[89]  Stefano Soatto,et al.  Learning Shape from Defocus , 2002, ECCV.

[90]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[91]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[92]  Ioannis Stamos,et al.  3-D model construction using range and image data , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[93]  J. Aggarwal,et al.  Computer Vision Analysis of Image Motion by Variational Methods , 2013 .

[94]  Tae-Seong Kim,et al.  Recognition of Human Home Activities via Depth Silhouettes and ℜ Transformation for Smart Homes , 2012 .

[95]  Rainer Stiefelhagen,et al.  Head pose estimation using stereo vision for human-robot interaction , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[96]  Hassan Foroosh,et al.  View-invariant action recognition using fundamental ratios , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[97]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[98]  Giorgio Metta,et al.  Keep it simple and sparse: real-time action recognition , 2013, J. Mach. Learn. Res..

[99]  Maria Petrou,et al.  Photometric stereo with an arbitrary number of illuminants , 2010, Comput. Vis. Image Underst..

[100]  Shyamsundar Rajaram,et al.  Human Activity Recognition Using Multidimensional Indexing , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[101]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[102]  Frederic Devernay,et al.  A Variational Method for Scene Flow Estimation from Stereo Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[103]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[104]  Maria Petrou,et al.  Texture anisotropy in 3-D images , 1999, IEEE Trans. Image Process..

[105]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[106]  Samsu Sempena,et al.  Human action recognition using Dynamic Time Warping , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.