Representation and recognition of complex human motion

The quest for a vision system capable of representing and recognizing arbitrary motions benefits from a low dimensional, non-specific representation of flow fields, to be used in high level classification tasks. We present Zernike polynomials as an ideal candidate for such a representation. The basis of Zernike polynomials is complete and orthogonal and can be used for describing many types of motion at many scales. Starting from image sequences, locally smooth image velocities are derived using a robust estimation procedure, from which are computed compact representations of the flow using the Zernike basis. Continuous density hidden Markov models are trained using the temporal sequences of vectors thus obtained, and are used for subsequent classification. We present results of our method applied to image sequences of facial expressions both with and without significant rigid head motion and to sequences of lip motion from a known database. We demonstrate that the Zernike representation yields results competitive with those obtained using principal components, while not committing to specific types of motion. It is therefore ideal as a fundamental building block for a vision system capable of classifying arbitrary motion types.

[1]  A. Prata,et al.  Algorithm for computation of Zernike polynomials expansion coefficients. , 1989, Applied optics.

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Majid Ahmadi,et al.  Pattern recognition with moment invariants: A comparative study and new results , 1991, Pattern Recognit..

[4]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[5]  Demetri Terzopoulos,et al.  Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Larry S. Davis,et al.  Computing spatio-temporal representations of human faces , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Javier R. Movellan,et al.  Visual Speech Recognition with Stochastic Networks , 1994, NIPS.

[8]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[9]  Alex Pentland,et al.  Facial expression recognition using a dynamic model and motion energy , 1995, Proceedings of IEEE International Conference on Computer Vision.

[10]  Larry S. Davis,et al.  Recognition of head gestures using hidden Markov models , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[11]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[12]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[13]  Timothy F. Cootes,et al.  Automatic Interpretation and Coding of Face Images Using Flexible Models , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Juergen Luettin,et al.  Speechreading using Probabilistic Models , 1997, Comput. Vis. Image Underst..

[15]  N. Thacker,et al.  Speechreading Using Probabilistic Models Speechreading Using Probabilistic Models , 1997 .

[16]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[17]  J. Little,et al.  Recognizing People by Their Gait: The Shape of Motion , 1998 .

[18]  Matthew Brand,et al.  Pattern discovery via entropy minimization , 1999, AISTATS.

[19]  Marian Stewart Bartlett,et al.  Classifying Facial Actions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..