Human Pose Estimation and Activity Recognition From Multi-View Videos: Comparative Explorations of Recent Developments

This paper presents a review and comparative study of recent multi-view approaches for human 3D pose estimation and activity recognition. We discuss the application domain of human pose estimation and activity recognition and the associated requirements, covering: advanced human–computer interaction (HCI), assisted living, gesture-based interactive games, intelligent driver assistance systems, movies, 3D TV and animation, physical therapy, autonomous mental development, smart environments, sport motion analysis, video surveillance, and video annotation. Next, we review and categorize recent approaches which have been proposed to comply with these requirements. We report a comparison of the most promising methods for multi-view human action recognition using two publicly available datasets: the INRIA Xmas Motion Acquisition Sequences (IXMAS) Multi-View Human Action Dataset, and the i3DPost Multi-View Human Action and Interaction Dataset. To compare the proposed methods, we give a qualitative assessment of methods which cannot be compared quantitatively, and analyze some prominent 3D pose estimation techniques for application, where not only the performed action needs to be identified but a more detailed description of the body pose and joint configuration. Finally, we discuss some of the shortcomings of multi-view camera setups and outline our thoughts on future directions of 3D body pose estimation and human action recognition.

[1]  Günther Seliger,et al.  Automated Image Based Recognition of Manual Work Steps in the Remanufacturing of Alternators , 2011 .

[2]  Linda B. Smith,et al.  Active Information Selection: Visual Attention Through the Hands , 2009, IEEE Transactions on Autonomous Mental Development.

[3]  Ventseslav Sainov,et al.  3-D Time-Varying Scene Capture Technologies—A Survey , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Mohan M. Trivedi,et al.  Understanding human interactions with track and body synergies (TBS) captured from multiple views , 2008, Comput. Vis. Image Underst..

[5]  Mohan M. Trivedi,et al.  3D tracking and dynamic analysis of human head movements and attentional targets , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[6]  Larry S. Davis,et al.  Action recognition using ballistic dynamics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Hans-Peter Seidel,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[8]  M. Trivedi,et al.  Articulated Human Body Pose Inference from Voxel Data Using a Kinematically Constrained Gaussian Mixture Model , 2007 .

[9]  Andrew M. Wallace,et al.  Evaluation of a hierarchical partitioned particle filter with action primitives , 2007, CVPR 2007.

[10]  Ákos Utasi,et al.  A 3-D marked point process model for multi-view people detection , 2011, CVPR 2011.

[11]  Ronald Poppe,et al.  Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets , 2007 .

[12]  Rama Chellappa,et al.  Model Driven Segmentation of Articulating Humans in Laplacian Eigenspace , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[14]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[15]  Jochen Radmer,et al.  Depth data based capture of human movement for biomechanical application in clinical rehabilitation use , 2010, 2010 5th International Symposium on Health Informatics and Bioinformatics.

[16]  Jean-Yves Guillemaut,et al.  3D action matching with key-pose detection , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[17]  Hans-Peter Kriegel,et al.  3D Shape Histograms for Similarity Search and Classification in Spatial Databases , 1999, SSD.

[18]  Iqbal Gondal,et al.  On dynamic scene geometry for view-invariant action matching , 2011, CVPR 2011.

[19]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jean Meunier,et al.  Fall Detection from Human Shape and Motion History Using Video Surveillance , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[21]  Rama Chellappa,et al.  Measuring human movement for biomechanical applications using markerless motion capture , 2006, Electronic Imaging.

[22]  Toby Howard,et al.  Real-time 3-D human body tracking using learnt models of behaviour , 2008, Comput. Vis. Image Underst..

[23]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[24]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[25]  Augusto Sarti,et al.  3-D Body Posture Tracking For Human Action Template Matching , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[26]  Aljoscha Smolic,et al.  Scene Representation Technologies for 3DTV—A Survey , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Joe Geigel,et al.  Motion capture for realtime control of virtual actors in live, distributed, theatrical performances , 2011, Face and Gesture 2011.

[28]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[29]  Mohan M. Trivedi,et al.  Multimodal Voxelization and Kinematically Constrained Gaussian Mixture Models for Full Hand Pose Estimation: An Integrated Systems Approach , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[30]  Michael J. Black,et al.  Guest Editorial: State of the Art in Image- and Video-Based Human Pose and Motion Estimation , 2010, International Journal of Computer Vision.

[31]  Montse Pardàs,et al.  Human model and motion based 3D action recognition in multiple view scenarios , 2006, 2006 14th European Signal Processing Conference.

[32]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Bernard Chazelle,et al.  Shape distributions , 2002, TOGS.

[34]  Ramón de la Rosa,et al.  A Robot Controlled by Blinking for Ambient Assisted Living , 2009, IWANN.

[35]  Mohan M. Trivedi,et al.  3D Shape Context Based Gesture Analysis Integrated with Tracking using Omni Video Array , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[36]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[38]  Takeo Kanade,et al.  A real time system for robust 3D voxel reconstruction of human motions , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[39]  Ioannis Pitas,et al.  3D Human Action Recognition for Multi-view Camera Systems , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[40]  Mohan M. Trivedi Human movement capture and analysis in intelligent environments , 2003, Machine Vision and Applications.

[41]  Mohan M. Trivedi,et al.  Human Body Model Acquisition and Tracking Using Voxel Data , 2003, International Journal of Computer Vision.

[42]  Adrian Hilton,et al.  Shape-Colour Histograms for matching 3D video sequences , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[43]  Mohan M. Trivedi,et al.  Human body modelling and tracking using volumetric representation: Selected recent studies and possibilities for extensions , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[44]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[45]  Ioannis Pitas,et al.  The i3DPost Multi-View and 3D Human Action/Interaction Database , 2009, 2009 Conference for Visual Media Production.

[46]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[47]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  David Fofi,et al.  A comparative survey on invisible structured light , 2004, IS&T/SPIE Electronic Imaging.

[49]  Mohan M. Trivedi,et al.  Turn-Intent Analysis Using Body Pose for Intelligent Driver Assistance , 2006, IEEE Pervasive Computing.

[50]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[51]  Mohan M. Trivedi,et al.  Driver assistance for “Keeping Hands on the Wheel and Eyes on the Road” , 2009, 2009 IEEE International Conference on Vehicular Electronics and Safety (ICVES).

[52]  Reinhard Koch,et al.  Time-of-Flight Sensors in Computer Graphics , 2009, Eurographics.

[53]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[54]  Alberto Del Bimbo,et al.  Semantic annotation of soccer videos: automatic highlights identification , 2003, Comput. Vis. Image Underst..

[55]  Thomas B. Moeslund,et al.  Invariant gait continuum based on the duty-factor , 2009, Signal Image Video Process..

[56]  Naoufel Werghi,et al.  Segmentation and Modeling of Full Human Body Shape From 3-D Scan Data: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[57]  Isaac Cohen,et al.  Inference of human postures by classification of 3D human body shape , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[58]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[60]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[61]  Mohan M. Trivedi,et al.  Hand modeling and tracking from voxel data: An integrated framework with automatic initialization , 2008, 2008 19th International Conference on Pattern Recognition.

[62]  Rui Li,et al.  3D Human Motion Tracking with a Coordinated Mixture of Factor Analyzers , 2009, International Journal of Computer Vision.

[63]  Rama Chellappa,et al.  Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  V. Ramasubramanian,et al.  Towards fast, view-invariant human action recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[65]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Mohan M. Trivedi,et al.  Holistic Sensing and Active Displays for Intelligent Driver Support Systems , 2007, Computer.

[67]  Pascal Fua,et al.  Making Action Recognition Robust to Occlusions and Viewpoint Changes , 2010, ECCV.

[68]  Mohan M. Trivedi,et al.  Head Pose Estimation and Augmented Reality Tracking: An Integrated System and Evaluation for Monitoring Driver Awareness , 2010, IEEE Transactions on Intelligent Transportation Systems.

[69]  Dariu Gavrila,et al.  Multi-view 3D Human Pose Estimation in Complex Environment , 2011, International Journal of Computer Vision.

[70]  Takeo Kanade,et al.  Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[71]  Seong-Whan Lee,et al.  A full-body gesture database for automatic gesture recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[72]  Mubarak Shah,et al.  Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[73]  Adrian Hilton,et al.  Shape Similarity for 3D Video Sequences of People , 2010, International Journal of Computer Vision.

[74]  Ioannis Pitas,et al.  View indepedent human movement recognition from multi-view video exploiting a circular invariant posture representation , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[75]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[76]  Martial Hebert,et al.  Prop-free pointing detection in dynamic cluttered environments , 2011, Face and Gesture 2011.

[77]  Thomas Malzbender,et al.  A Survey of Methods for Volumetric Scene Reconstruction from Photographs , 2001, VG.

[78]  Olivier Bernier,et al.  Fast nonparametric belief propagation for real-time stereo articulated body tracking , 2009, Comput. Vis. Image Underst..

[79]  Alexandros Iosifidis,et al.  Movement recognition exploiting multi-view information , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.

[80]  Mohan M. Trivedi,et al.  Dynamic context capture and distributed video arrays for intelligent spaces , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[81]  Julius Ziegler,et al.  Tracking of the Articulated Upper Body on Multi-View Stereo Image Sequences , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[82]  Rainer Stiefelhagen,et al.  Computers in the Human Interaction Loop , 2009, Human-Computer Interaction Series.

[83]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Juergen Gall,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[85]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[86]  Pinar Duygulu Sahin,et al.  A new pose-based representation for recognizing actions from multiple cameras , 2011, Comput. Vis. Image Underst..

[87]  Mohan M. Trivedi,et al.  Pedal error prediction by driver foot gesture analysis: A vision-based inquiry , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[88]  Honghai Liu,et al.  Advances in View-Invariant Human Motion Analysis: A Review , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[89]  Yale Song,et al.  Multi-signal gesture recognition using temporal smoothing hidden conditional random fields , 2011, Face and Gesture 2011.

[90]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[91]  A. Lee Swindlehurst,et al.  IEEE Journal of Selected Topics in Signal Processing Inaugural Issue: [editor-in-chief's message] , 2007, J. Sel. Topics Signal Processing.

[92]  Radu Horaud,et al.  Human Motion Tracking with a Kinematic Parameterization of Extremal Contours , 2007, International Journal of Computer Vision.

[93]  Emiliano Gambaretto,et al.  Markerless Motion Capture through Visual Hull, Articulated ICP and Subject Specific Model Generation , 2010, International Journal of Computer Vision.

[94]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[95]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[96]  Richard Souvenir,et al.  Learning the viewpoint manifold for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[97]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[98]  Mohan M. Trivedi,et al.  Occupant posture analysis with stereo and thermal infrared video: algorithms and experimental evaluation , 2004, IEEE Transactions on Vehicular Technology.

[99]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[100]  Marcel Körtgen,et al.  3D Shape Matching with 3D Shape Contexts , 2003 .

[101]  Olivier D. Faugeras,et al.  3D Articulated Models and Multiview Tracking with Physical Forces , 2001, Comput. Vis. Image Underst..

[102]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[103]  Mohan M. Trivedi,et al.  Introducing “XMOB”: Extremity Movement Observation Framework for Upper Body Pose Tracking in 3D , 2009, 2009 11th IEEE International Symposium on Multimedia.