Automatic Video-based Analysis of Human Motion

The human motion contains valuable information in many situations and people frequently perform an unconscious analysis of the motion of other people to understand their actions, intentions, and state of mind. An automatic analysis of human motion will facilitate many applications and thus has received great interest from both industry and research communities. The focus of this thesis is on video-based analysis of human motion and the thesis presents work within three overall topics, namely foreground segmentation, action recognition, and human pose estimation. Foreground segmentation is often the first important step in the analysis of human motion. By separating foreground from background the subsequent analysis can be focused and efficient. This thesis presents a robust background subtraction method that can be initialized with foreground objects in the scene and is capable of handling foreground camouflage, shadows, and moving backgrounds. The method continuously updates the background model to maintain high quality segmentation over long periods of time. Within action recognition the thesis presents work on both recognition of arm gestures and gait types. A key-frame based approach is presented to recognize arm gestures. The method extracts a set of characteristic poses and describes them by their local motion resulting in motion primitives. A probabilistic edit distance is used to classify a sequence of motion primitives as a gesture. This 2D recognition process is extended into a view-invariant recognition of arm gestures by use of a range camera that generates 3D data and allows for a 3D equivalent of motion primitives. The recognition of gait types takes a different approach and extracts silhouettes that are matched against a database. A gait continuum is introduced to better describe the whole range of gait which deals with an inherent ambiguity of gait types. Human pose estimation does not target a specific action but is considered as a good basis for the recognition of any action. The pose estimation work presented in this thesis is mainly concerned with the problems of interacting people and the complex occlusions that interactions produce. A pose estimation method based on the pictorial structures framework is presented. Body part detection combines edge and appearance information in a dynamic way. Occluded body parts are detected by pruning the foreground mask into a mask of possible occlusions. A multi-view approach to pose estimation is also presented that integrates low level information from different cameras to generate better pose estimates during heavy occlusions. The works presented in this thesis contribute in these different areas of video-based analysis of human motion and altogether bring the solution of fully automatic analysis and understanding of human motion closer.

[1]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Hao Jiang,et al.  Human pose estimation using consistent max-covering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Thomas B. Moeslund,et al.  Invariant gait continuum based on the duty-factor , 2009, Signal Image Video Process..

[4]  Larry S. Davis,et al.  Real-time foreground-background segmentation using codebook model , 2005, Real Time Imaging.

[5]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[6]  Amit K. Roy-Chowdhury,et al.  Tracking and Activity Recognition Through Consensus in Distributed Camera Networks , 2010, IEEE Transactions on Image Processing.

[7]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[8]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[11]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[12]  Mohan M. Trivedi,et al.  Tracking of Individuals in Very Long Video Sequences , 2006, ISVC.

[13]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[14]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Honghai Liu,et al.  Visual-Based View-Invariant Human Motion Analysis: A Review , 2008, KES.

[16]  Dariu Gavrila,et al.  A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Emiliano Gambaretto,et al.  Markerless Motion Capture through Visual Hull, Articulated ICP and Subject Specific Model Generation , 2010, International Journal of Computer Vision.

[19]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yang Wang,et al.  Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[21]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Shuichi Nishio,et al.  Scalable and robust multi-people head tracking by combining distributed multiple sensors , 2010, Intell. Serv. Robotics.

[23]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[24]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[26]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[28]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[29]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[30]  Haibin Ling,et al.  Shape Classification Using the Inner-Distance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Adrian Hilton,et al.  Simultaneous Pose Estimation of Multiple People using Multiple-View Cues with Hierarchical Sampling , 2003, BMVC.

[32]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[35]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[36]  Hossein Ragheb,et al.  MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[37]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[38]  Thomas B. Moeslund,et al.  Action Recognition in Semi-synthetic Images using Motion Primitives , 2006 .