Learning and recognizing human dynamics in video sequences

This paper describes a probabilistic decomposition of human dynamics at multiple abstractions, and shows how to propagate hypotheses across space, time, and abstraction levels. Recognition in this framework is the succession of very general low level grouping mechanisms to increased specific and learned model based grouping techniques at higher levels. Hard decision thresholds are delayed and resolved by higher level statistical models and temporal context. Low-level primitives are areas of coherent motion found by EM clustering, mid-level categories are simple movements represented by dynamical systems, and high-level complex gestures are represented by Hidden Markov Models as successive phases of ample movements. We show how such a representation can be learned from training data, and apply It to the example of human gait recognition.

[1]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  J. O'Rourke,et al.  Model-based image analysis of human motion using constraint propagation , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  David C. Hogg Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[5]  Y. Bar-Shalom Tracking and data association , 1988 .

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  P. Green On Use of the EM Algorithm for Penalized Likelihood Estimation , 1990 .

[8]  Edward H. Adelson,et al.  Probability distributions of optical flow , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Nigel Goddard,et al.  The Perception of Articulated Motion: Recognizing Moving Light Displays , 1992 .

[11]  Randal C. Nelson,et al.  Detecting activities , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Karl Rohr,et al.  Incremental recognition of pedestrians from image sequences , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Michael J. Black,et al.  Mixture models for optical flow computation , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Alex Pentland,et al.  Classifying Hand Gestures with a View-Based Distributed Representation , 1993, NIPS.

[15]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Edward H. Adelson,et al.  Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[18]  Yee-Hong Yang,et al.  First Sight: A Human Body Outline Labeling System , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[20]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[21]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[22]  Alex Pentland,et al.  Facial expression recognition using a dynamic model and motion energy , 1995, Proceedings of IEEE International Conference on Computer Vision.

[23]  Aaron F. Bobick,et al.  A state-based technique for the summarization and recognition of gesture , 1995, Proceedings of IEEE International Conference on Computer Vision.

[24]  Harpreet S. Sawhney,et al.  Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding , 1995, Proceedings of IEEE International Conference on Computer Vision.

[25]  Larry S. Davis,et al.  Towards 3-D model-based tracking and recognition of human movement: a multi-view approach , 1995 .

[26]  Michael Isard,et al.  Learning to Track the Visual Motion of Contours , 1995, Artif. Intell..

[27]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[28]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Other Conferences.

[29]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[30]  Edward H. Adelson,et al.  A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  James W. Davis,et al.  Real-time recognition of activity using temporal templates , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[32]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..