Modeling and Recognition of Complex Human Activities

Activity recognition is a field of computer vision which has shown great progress in the past decade. Starting from simple single person activities, research in activity recognition is moving toward more complex scenes involving multiple objects and natural environments. The main challenges in the task include being able to localize and recognize events in a video and deal with the large amount of variation in viewpoint, speed of movement and scale. This chapter gives the reader an overview of the work that has taken place in activity recognition, especially in the domain of complex activities involving multiple interacting objects. We begin with a description of the challenges in activity recognition and give a broad overview of the different approaches. We go into the details of some of the feature descriptors and classification strategies commonly recognized as being the state of the art in this field. We then move to more complex recognition systems, discussing the challenges in complex activity recognition and some of the work which has taken place in this respect. Finally, we provide some examples of recent work in complex activity recognition. The ability to recognize complex behaviors involving multiple interacting objects is a very challenging problem and future work needs to study its various aspects of features, recognition strategies, models, robustness issues, and context, to name a few.

[1]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Mubarak Shah,et al.  A Streakline Representation of Flow in Crowded Scenes , 2010, ECCV.

[3]  Alper Yilmaz,et al.  Learning Relations among Movie Characters: A Social Network Perspective , 2010, ECCV.

[4]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  I. Dowman ISPRS Intercommission conference on Fast Processing of Photogrammetric Data: 2–4 June 1987, Interlaken-Switzerland , 1988 .

[8]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[9]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Martial Hebert,et al.  Representing Pairwise Spatial and Temporal Relations for Action Recognition , 2010, ECCV.

[11]  Jake K. Aggarwal,et al.  Semantic Representation and Recognition of Continued and Recursive Human Activities , 2009, International Journal of Computer Vision.

[12]  Ramakant Nevatia,et al.  Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Bir Bhanu,et al.  VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication , 2011 .

[14]  Qiang Ji,et al.  Knowledge Based Activity Recognition with Dynamic Bayesian Network , 2010, ECCV.

[15]  Peyman Milanfar,et al.  Detection of human actions from a single example , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[17]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[19]  Bart De Moor,et al.  Subspace angles between ARMA models , 2002, Syst. Control. Lett..

[20]  Jake K. Aggarwal,et al.  A hierarchical Bayesian network for event recognition of human actions and interactions , 2004, Multimedia Systems.

[21]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[22]  Sudeep Sarkar,et al.  Improved gait recognition by gait dynamics normalization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[24]  Ramakant Nevatia,et al.  Learning 3D action models from a few 2D videos for view invariant action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  B. Moor,et al.  Subspace angles and distances between ARMA models , 2000 .

[27]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  P. Andersen Nonverbal Communication: Forms and Functions , 1998 .

[29]  Sangho Park,et al.  Recognition of two-person interactions using a hierarchical Bayesian network , 2003, IWVS '03.

[30]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Christophe Rosenberger,et al.  Abnormal events detection based on spatio-temporal co-occurences , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Martial Hebert,et al.  A spectral technique for correspondence problems using pairwise constraints , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[34]  Rama Chellappa,et al.  Identification of humans using gait , 2004, IEEE Transactions on Image Processing.

[35]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  R. Vidal,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[38]  Tieniu Tan,et al.  Complex Activity Representation and Recognition by Extended Stochastic Grammar , 2006, ACCV.

[39]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Rogério Schmidt Feris,et al.  Unsupervised Action Classification Using Space-Time Link Analysis , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[41]  Stefano Soatto,et al.  Dynamic Shape and Appearance Models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Jake K. Aggarwal,et al.  An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010 , 2010, ICPR Contests.

[43]  Klaus Diepold,et al.  A cognitive approach for a robotic welding system that can learn how to weld from acoustic data , 2009, 2009 IEEE International Symposium on Computational Intelligence in Robotics and Automation - (CIRA).

[44]  Bir Bhanu,et al.  Distributed Video Sensor Networks , 2011 .

[45]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[46]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.

[47]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[48]  Ronald M. Lesperance,et al.  The Gaussian derivative model for spatial-temporal vision: I. Cortical model. , 2001, Spatial vision.

[49]  Michael S. Ryoo,et al.  One video is sufficient? Human activity recognition using active video composition , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[50]  Rama Chellappa,et al.  Epitomic Representation of Human Activities , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Hans Knutsson,et al.  Signal processing for computer vision , 1994 .

[52]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[53]  Christophe Rosenberger,et al.  Abnormal events detection based on spatio-temporal co-occurences , 2009, CVPR.

[54]  Rama Chellappa,et al.  Attribute Grammar-Based Event Recognition and Anomaly Detection , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[55]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[56]  Randal C. Nelson,et al.  Detection and Recognition of Periodic, Nonrigid Motion , 1997, International Journal of Computer Vision.

[57]  Luc Van Gool,et al.  What's going on? Discovering spatio-temporal dependencies in dynamic scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[58]  Ramakant Nevatia,et al.  Event Detection and Analysis from Video Streams , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[60]  Aggelos K. Katsaggelos,et al.  Anomalous video event detection using spatiotemporal context , 2011 .

[61]  Ricky J. Sethi,et al.  Activity recognition by integrating the physics of motion with a Neuromorphic model of perception , 2009, 2009 Workshop on Motion and Video Computing (WMVC).

[62]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[63]  Michael Isard,et al.  Learning and Classification of Complex Dynamics , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Juan Carlos Niebles,et al.  Spatial-Temporal correlatons for unsupervised action classification , 2008, 2008 IEEE Workshop on Motion and video Computing.

[65]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[66]  Rama Chellappa,et al.  "Shape Activity": a continuous-state HMM for moving/deforming shapes with application to abnormal activity detection , 2005, IEEE Transactions on Image Processing.