A Review on Video-Based Human Activity Recognition

This review article surveys extensively the current progresses made toward video-based human activity recognition. Three aspects for human activity recognition are addressed including core technology, human activity recognition systems, and applications from low-level to high-level representation. In the core technology, three critical processing stages are thoroughly discussed mainly: human object segmentation, feature extraction and representation, activity detection and classification algorithms. In the human activity recognition systems, three main types are mentioned, including single person activity recognition, multiple people interaction and crowd behavior, and abnormal activity recognition. Finally the domains of applications are discussed in detail, specifically, on surveillance environments, entertainment environments and healthcare systems. Our survey, which aims to provide a comprehensive state-of-the-art review of the field, also addresses several challenges associated with these systems and applications. Moreover, in this survey, various applications are discussed in great detail, specifically, a survey on the applications in healthcare monitoring systems.

[1]  Suman K. Mitra,et al.  Human Action Recognition Using DFT , 2011, 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics.

[2]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Wei Niu,et al.  Human activity detection and recognition for video surveillance , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[4]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[5]  Azriel Rosenfeld,et al.  Tracking Groups of People , 2000, Comput. Vis. Image Underst..

[6]  Robert M. Gray,et al.  Image classification using GMM with context information and with a solution of singular covariance problem , 2003, Data Compression Conference, 2003. Proceedings. DCC 2003.

[7]  Michael F. Cohen,et al.  Monocular Video Foreground/Background Segmentation by Tracking Spatial-Color Gaussian Mixture Models , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[8]  Mohammed Bennamoun,et al.  Context-Based Appearance Descriptor for 3D Human Pose Estimation from Monocular Images , 2009, 2009 Digital Image Computing: Techniques and Applications.

[9]  Jake K. Aggarwal,et al.  Human motion: modeling and recognition of actions and interactions , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[10]  Somayeh Danafar,et al.  Action Recognition for Surveillance Applications Using Optic Flow and SVM , 2007, ACCV.

[11]  Nuno Vasconcelos,et al.  Anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Davrondzhon Gafurov,et al.  A Survey of Biometric Gait Recognition: Approaches, Security and Challenges , 2007 .

[13]  Massimo Piccardi,et al.  Background subtraction techniques: a review , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[14]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Emile A. Hendriks,et al.  Markerless human motion capture and pose recognition , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[16]  François Brémond,et al.  Crowd Behavior Recognition for Video Surveillance , 2008, ACIVS.

[17]  Atsushi Nakazawa,et al.  Human tracking using distributed vision systems , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[18]  Jae Yeon Lee,et al.  Detecting and tracking moving object using an active camera , 2005, The 7th International Conference on Advanced Communication Technology, 2005, ICACT 2005..

[19]  N. Ashida,et al.  A method for supporting at-home fitness exercise guidance and at-home nursing care for the elders, video-based simple measurement system , 2008, HealthCom 2008 - 10th International Conference on e-health Networking, Applications and Services.

[20]  Alex Pentland,et al.  Human computing and machine understanding of human behavior: a survey , 2006, ICMI '06.

[21]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[22]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[23]  François Brémond,et al.  Automatic Video Interpretation: A Novel Algorithm for Temporal Scenario Recognition , 2003, IJCAI.

[24]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[25]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[27]  Eugene Demidenko,et al.  Statistical Analysis of Shape , 2005 .

[28]  A. Enis Çetin,et al.  Silhouette-Based Method for Object Classification and Human Action Recognition in Video , 2006, ECCV Workshop on HCI.

[29]  John R. Hershey,et al.  Single-Channel Multitalker Speech Recognition , 2010, IEEE Signal Processing Magazine.

[30]  B. Ijaz,et al.  Vision based human activity tracking using artificial neural networks , 2010, 2010 International Conference on Intelligent and Advanced Systems.

[31]  Mingjing Li,et al.  Multi-view face detection with FloatBoost , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[32]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[33]  Haim H. Permuter,et al.  A study of Gaussian mixture models of color and texture features for image classification and segmentation , 2006, Pattern Recognit..

[34]  Jing-Jing Fang,et al.  Automatic body feature extraction from a marker-less scanned human body , 2007, Comput. Aided Des..

[35]  Monique Thonnat,et al.  Recurrent Bayesian Network for the Recognition of Human Behaviors from Video , 2003, ICVS.

[36]  H. Foroughi,et al.  An eigenspace-based approach for human fall detection using Integrated Time Motion Image and Neural Network , 2008, 2008 9th International Conference on Signal Processing.

[37]  L. Kratz,et al.  Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[39]  Josechu J. Guerrero,et al.  View-invariant human feature extraction for video-surveillance applications , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[40]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Ram Nevatia,et al.  Body Part Detection for Human Pose Estimation and Tracking , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[42]  James J. Little,et al.  Simultaneous Tracking and Action Recognition using the PCA-HOG Descriptor , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[43]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[44]  Rong Zhang,et al.  Integrating bottom-up/top-down for object recognition by data driven Markov chain Monte Carlo , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[45]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[46]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[47]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Christopher W. Geib,et al.  The meaning of action: a review on action recognition and mapping , 2007, Adv. Robotics.

[49]  Shaou-Gang Miaou,et al.  A gait analysis system using two cameras with orthogonal view , 2011, 2011 International Conference on Multimedia Technology.

[50]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  James J. Little,et al.  Incremental Learning for Video-Based Gait Recognition With LBP Flow , 2013, IEEE Transactions on Cybernetics.

[52]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Martial Hebert,et al.  Spatio-temporal Shape and Flow Correlation for Action Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Pedro Ribeiro,et al.  Human Activity Recognition from Video: modeling, feature selection and classification architecture , 2005 .

[55]  Baharak Shakeri Aski,et al.  Intelligent video surveillance for monitoring fall detection of elderly in home environments , 2008, 2008 11th International Conference on Computer and Information Technology.

[56]  Ronan Sicre,et al.  Shopping scenarios semantic analysis in videos , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[57]  H. Pourreza,et al.  An eigenspace-based approach for human fall detection using Integrated Time Motion Image and multi-class Support Vector Machine , 2008, 2008 4th International Conference on Intelligent Computer Communication and Processing.

[58]  Mubarak Shah,et al.  Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Jenq-Neng Hwang,et al.  Real-Time 3D Human Pose Estimation from Monocular View with Applications to Event Detection and Video Gaming , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[60]  Adrien Descamps,et al.  Counting People in the Crowd Using a Generic Head Detector , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[61]  Tony P. Pridmore,et al.  Object and event recognition for stroke rehabilitation , 2003, Visual Communications and Image Processing.

[62]  Samsu Sempena,et al.  Human action recognition using Dynamic Time Warping , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[63]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[64]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[65]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[66]  Michael Hansen,et al.  Real-Time Tracking of Moving Objects with an Active Camera , 1998, Real Time Imaging.

[67]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[68]  Monique Thonnat,et al.  Understanding of human behaviors from videos in nursing care monitoring systems , 2007, J. High Speed Networks.

[69]  Mun Wai Lee,et al.  A model-based approach for estimating human 3D poses in static images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[71]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Jenq-Neng Hwang,et al.  Object-based analysis and interpretation of human motion in sports video sequences by dynamic bayesian networks , 2003, Comput. Vis. Image Underst..

[73]  Kazuhiko Sumi,et al.  A robust background subtraction method for changing background , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[74]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[75]  Osama Masoud,et al.  Detection of loitering individuals in public transportation areas , 2005, IEEE Transactions on Intelligent Transportation Systems.

[76]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[77]  Svetha Venkatesh,et al.  Human Behavior Recognition with Generic Exponential Family Duration Modeling in the Hidden Semi-Markov Model , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[78]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[79]  Svetha Venkatesh,et al.  Explicit State Duration HMM for Abnormality Detection in Sequences of Human Activity , 2004, PRICAI.

[80]  Youtian Du,et al.  Human Interaction Representation and Recognition Through Motion Decomposition , 2007, IEEE Signal Processing Letters.

[81]  Shaou-Gang Miaou,et al.  A vision-based walking posture analysis system without markers , 2010, 2010 2nd International Conference on Signal Processing Systems.

[82]  Svetha Venkatesh,et al.  Activity recognition and abnormality detection with the switching hidden semi-Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[83]  Yee-Hong Yang,et al.  First Sight: A Human Body Outline Labeling System , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[84]  Ramakant Nevatia,et al.  Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[85]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[86]  Jenq-Neng Hwang,et al.  View-invariant 3D human body pose reconstruction using a monocular video camera , 2011, 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras.

[87]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[88]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[90]  Alireza Rezvanian,et al.  Robust Fall Detection Using Human Shape and Multi-class Support Vector Machine , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[91]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  R. Nevatia,et al.  Online, Real-time Tracking and Recognition of Human Actions , 2008, 2008 IEEE Workshop on Motion and video Computing.

[93]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Howard D. Wactlar,et al.  Dining activity analysis using a hidden Markov model , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[95]  Wann-Yun Shieh,et al.  Speedup the Multi-camera Video-Surveillance System for Elder Falling Detection , 2009, 2009 International Conference on Embedded Software and Systems.

[96]  V. Ramasubramanian,et al.  Towards fast, view-invariant human action recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[97]  Hang-Bong Kang,et al.  Integrated multiple behavior models for abnormal crowd behavior detection , 2012, 2012 IEEE Southwest Symposium on Image Analysis and Interpretation.

[98]  Dmitry B. Goldgof,et al.  Understanding Transit Scenes: A Survey on Human Behavior-Recognition Algorithms , 2010, IEEE Transactions on Intelligent Transportation Systems.

[99]  Li-Chen Fu,et al.  Real-time object detection and tracking on a moving camera platform , 2009, 2009 ICCAS-SICE.

[100]  Kanti V. Mardia,et al.  The Statistical Analysis of Shape , 1998 .

[101]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[102]  Mircea Nicolescu,et al.  Human Body Parts Tracking Using Torso Tracking: Applications to Activity Recognition , 2012, 2012 Ninth International Conference on Information Technology - New Generations.

[103]  Pau-Choo Chung,et al.  A Visual Context-Awareness-Based Sleeping-Respiration Measurement System , 2010, IEEE Transactions on Information Technology in Biomedicine.

[104]  Danijela Ristic-Durrant,et al.  A robust markerless vision-based human gait analysis system , 2011, 2011 6th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI).

[105]  Anup Basu,et al.  Motion Tracking with an Active Camera , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[106]  Nicolas Thome,et al.  Fast People Counting Using Head Detection from Skeleton Graph , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[107]  Jenq-Neng Hwang,et al.  An Effective 3D Geometric Relational Feature Descriptor for Human Action Recognition , 2012, 2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future.

[108]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[109]  A. Enis Çetin,et al.  HMM Based Falling Person Detection Using Both Audio and Video , 2005, 2006 IEEE 14th Signal Processing and Communications Applications.

[110]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[111]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[112]  Jenq-Neng Hwang,et al.  Human tracking by adaptive Kalman filtering and multiple kernels tracking with projected gradients , 2011, 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras.

[113]  Hoang Le Uyen Thuc,et al.  Quasi-periodic action recognition from monocular videos via 3D human models and cyclic HMMs , 2012, The 2012 International Conference on Advanced Technologies for Communications.

[114]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[115]  A. Sengto,et al.  Human falling detection algorithm using back propagation neural network , 2012, The 5th 2012 Biomedical Engineering International Conference.

[116]  Xin Lu,et al.  Recognizing non-rigid human actions using joints tracking in space-time , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[117]  Wei-Yang Lin,et al.  Recognizing Human Actions Using NWFE-Based Histogram Vectors , 2010, EURASIP J. Adv. Signal Process..

[118]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[119]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[120]  J. Ohya,et al.  Real-time estimation of human body posture from monocular thermal images , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[121]  Rama Chellappa,et al.  Applications of a Simple Characterization of Human Gait in Surveillance , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[122]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[123]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[124]  Radha Poovendran,et al.  Human activity recognition for video surveillance , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[125]  N. Papanikolopoulos,et al.  Vision-Based Human Tracking and Activity Recognition , 2003 .

[126]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[127]  Jean Meunier,et al.  Real Time Detection, Tracking and Recognition of Medication Intake , 2009 .

[128]  K. Grauman,et al.  Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[129]  Rita Cucchiara,et al.  Detecting Moving Objects, Ghosts, and Shadows in Video Streams , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[130]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[131]  Meinard Müller,et al.  Efficient content-based retrieval of motion capture data , 2005, SIGGRAPH '05.

[132]  Silvia Conforto,et al.  Markerless Human Motion Analysis in Gauss–Laguerre Transform Domain: An Application to Sit-To-Stand in Young and Elderly People , 2009, IEEE Transactions on Information Technology in Biomedicine.

[133]  Sergio A. Velastin,et al.  Intelligent distributed surveillance systems: a review , 2005 .

[134]  Shyamsundar Rajaram,et al.  Human Activity Recognition Using Multidimensional Indexing , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[135]  Jonathan H. Connell,et al.  A Statistical Approach for Real-time Robust Background Subtrac tion and Shadow Detection , 2014 .

[136]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[137]  Takashi Toriu,et al.  A Markov Random Walk Model for Loitering People Detection , 2010, 2010 Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[138]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[139]  Soraia Raupp Musse,et al.  Crowd Analysis Using Computer Vision Techniques , 2010, IEEE Signal Processing Magazine.

[140]  Andrzej Czyzewski,et al.  Behavior Analysis and Dynamic Crowd Management in Video Surveillance System , 2011, 2011 22nd International Workshop on Database and Expert Systems Applications.

[141]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[142]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[143]  Gregory D. Hager,et al.  Fast and Globally Convergent Pose Estimation from Video Images , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[144]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[145]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.