Human activity analysis

Human activity recognition is an important area of computer vision research. Its applications include surveillance systems, patient monitoring systems, and a variety of systems that involve interactions between persons and electronic devices such as human-computer interfaces. Most of these applications require an automated recognition of high-level activities, composed of multiple simple (or atomic) actions of persons. This article provides a detailed overview of various state-of-the-art research papers on human activity recognition. We discuss both the methodologies developed for simple human actions and those for high-level activities. An approach-based taxonomy is chosen that compares the advantages and limitations of each approach. Recognition methodologies for an analysis of the simple actions of a single person are first presented in the article. Space-time volume approaches and sequential approaches that represent and recognize activities directly from input images are discussed. Next, hierarchical recognition methodologies for high-level activities are presented and compared. Statistical approaches, syntactic approaches, and description-based approaches for hierarchical recognition are discussed in the article. In addition, we further discuss the papers on the recognition of human-object interactions and group activities. Public datasets designed for the evaluation of the recognition methodologies are illustrated in our article as well, comparing the methodologies' performances. This review will provide the impetus for future research in more productive areas.

[1]  Jake K. Aggarwal,et al.  Computer Analysis of Moving Polygonal Images , 1975, IEEE Transactions on Computers.

[2]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[3]  Jake K. Aggarwal,et al.  Structure from Motion of Rigid and Jointed Objects , 1981, Artif. Intell..

[4]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[5]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[6]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Edward H. Adelson,et al.  Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  James F. Allen,et al.  Actions and Events in Interval Temporal Logic , 1994, J. Log. Comput..

[10]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  Mubarak Shah,et al.  Motion-based recognition a survey , 1995, Image Vis. Comput..

[12]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[13]  T D Albright,et al.  Visual motion perception. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Larry S. Davis,et al.  Towards 3-D model-based tracking and recognition of human movement: a multi-view approach , 1995 .

[15]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Claudio S. Pinhanez,et al.  Human action detection using PNF propagation of temporal constraints , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[17]  Aaron F. Bobick,et al.  A Framework for Recognizing Multi-Agent Action from Visual Evidence , 1999, AAAI/IAAI.

[18]  James L. Crowley,et al.  Probabilistic recognition of activity using local appearance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[19]  Hyung Lee-Kwang,et al.  Modeling and recognition of hand gesture using colored Petri nets , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[20]  Abbas K. Zaidi,et al.  On temporal logic programming using Petri nets , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[21]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[22]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[23]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[26]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Mubarak Shah,et al.  View-invariance in action recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[29]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[31]  Jeffrey Mark Siskind,et al.  Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic , 1999, J. Artif. Intell. Res..

[32]  François Brémond,et al.  Group behavior recognition with multiple cameras , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[33]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[34]  Irfan A. Essa,et al.  Recognizing multitasked activities from video using stochastic context-free grammar , 2002, AAAI/IAAI.

[35]  Irfan A. Essa,et al.  Expectation grammars: leveraging high-level expectations for activity recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[36]  Rama Chellappa,et al.  Activity recognition using the dynamics of the configuration of interacting objects , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[37]  François Brémond,et al.  Automatic Video Interpretation: A Novel Algorithm for Temporal Scenario Recognition , 2003, IJCAI.

[38]  Ramakant Nevatia,et al.  Hierarchical Language-based Representation of Events in Video Streams , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[39]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40]  Shaogang Gong,et al.  Recognition of group activities using dynamic probabilistic networks , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  Jake K. Aggarwal,et al.  A hierarchical Bayesian network for event recognition of human actions and interactions , 2004, Multimedia Systems.

[43]  Yaser Sheikh,et al.  CASEE: A Hierarchical Event Representation for the Analysis of Videos , 2004, AAAI.

[44]  Ramakant Nevatia,et al.  Video-based event recognition: activity representation and probabilistic recognition methods , 2004, Comput. Vis. Image Underst..

[45]  Ram Nevatia,et al.  Automatic Tracking and Labeling of Human Activities in a Video Sequence , 2004 .

[46]  Yan Huang,et al.  Propagation networks for recognition of partially ordered sequential action , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[47]  Ramakant Nevatia,et al.  An Ontology for Video Event Representation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[48]  Larry S. Davis,et al.  Representation and Recognition of Events in Surveillance Video Using Petri Nets , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[49]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[50]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[51]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[52]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[53]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[54]  Svetha Venkatesh,et al.  Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[55]  Mubarak Shah,et al.  Detecting group activities using rigidity of formation , 2005, MULTIMEDIA '05.

[56]  Longin Jan Latecki,et al.  2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS) , 2005, Research in Microelectronics and Electronics, 2005 PhD.

[57]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[58]  Mubarak Shah,et al.  Recognizing human actions in videos acquired by uncalibrated moving cameras , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[59]  Svetha Venkatesh,et al.  Combining image regions and human activity for indirect object recognition in indoor wide-angle views , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[60]  A. Sugimoto,et al.  Deleted Interpolation Using a Hierarchical Bayesian Grammar Network for Recognizing Human Activity , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[61]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[62]  Jake K. Aggarwal,et al.  Semantic Understanding of Continued and Recursive Human Activities , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[63]  Octavia I. Camps,et al.  Activity Recognition from Silhouettes using Linear Systems and Model (In)validation Techniques , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[64]  Samy Bengio,et al.  Modeling individual and group actions in meetings with layered HMMs , 2006, IEEE Transactions on Multimedia.

[65]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[66]  S. Shankar Sastry,et al.  Compressed Domain Real-time Action Recognition , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[67]  Rama Chellappa,et al.  Attribute Grammar-Based Event Recognition and Anomaly Detection , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[68]  Jake K. Aggarwal,et al.  Detection of Fence Climbing from Monocular Video , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[69]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[70]  Ze-Nian Li,et al.  Successive Convex Matching for Action Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[71]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Ramakant Nevatia,et al.  Coupled Hidden Semi Markov Models for Activity Recognition , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[73]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[74]  Jake K. Aggarwal,et al.  Detection of abandoned objects in crowded environments , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[75]  Takeo Kanade,et al.  Simultaneous registration and clustering for temporal segmentation of facial gestures from video , 2007, VISAPP.

[76]  P. L. Venetianer,et al.  Stationary target detection using the objectvideo surveillance system , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[77]  J. Santos-Victor,et al.  Detecting Luggage Related Behaviors Using a New Temporal Boost Algorithm ∗ , 2007 .

[78]  Christopher W. Geib,et al.  The meaning of action: a review on action recognition and mapping , 2007, Adv. Robotics.

[79]  Martial Hebert,et al.  Spatio-temporal Shape and Flow Correlation for Action Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Jake K. Aggarwal,et al.  Hierarchical Recognition of Human Activities Interacting with Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Yoichi Sato,et al.  Recovering the Basic Structure of Human Activities from a Video-Based Symbol String , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[84]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[85]  Larry S. Davis,et al.  Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[87]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[88]  Zhu Li,et al.  Real-time human action recognition by luminance field trajectory analysis , 2008, ACM Multimedia.

[89]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Majid Sarrafzadeh,et al.  Fast GPU-based space-time correlation for activity recognition in video sequences , 2008, 2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.

[91]  Jake K. Aggarwal,et al.  Semantic Representation and Recognition of Continued and Recursive Human Activities , 2009, International Journal of Computer Vision.

[92]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[93]  Guangyou Xu,et al.  Group Interaction Analysis in Dynamic Context , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[94]  J.K. Aggarwal,et al.  Recognition of High-level Group Activities Based on Activities of Individual Members , 2008, 2008 IEEE Workshop on Motion and video Computing.

[95]  Juan Carlos Niebles,et al.  Spatial-Temporal correlatons for unsupervised action classification , 2008, 2008 IEEE Workshop on Motion and video Computing.

[96]  Larry S. Davis,et al.  Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[97]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[98]  S. Gong,et al.  Recognising action as clouds of space-time interest points , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[99]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[100]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[101]  S. Kollias,et al.  Dense saliency-based spatiotemporal feature points for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[102]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[103]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Geoffrey E. Hinton,et al.  Learning Generative Texture Models with extended Fields-of-Experts , 2009, BMVC.

[105]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, CVPR.

[106]  Peng Dai,et al.  Group Interaction Analysis in Dynamic Context$^{\ast}$ , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[107]  David Elliott,et al.  In the Wild , 2010 .