Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions

System theoretic approaches to action recognition model the dynamics of a scene with linear dynamical systems (LDSs) and perform classification using metrics on the space of LDSs, e.g. Binet-Cauchy kernels. However, such approaches are only applicable to time series data living in a Euclidean space, e.g. joint trajectories extracted from motion capture data or feature point trajectories extracted from video. Much of the success of recent object recognition techniques relies on the use of more complex feature descriptors, such as SIFT descriptors or HOG descriptors, which are essentially histograms. Since histograms live in a non-Euclidean space, we can no longer model their temporal evolution with LDSs, nor can we classify them using a metric for LDSs. In this paper, we propose to represent each frame of a video using a histogram of oriented optical flow (HOOF) and to recognize human actions by classifying HOOF time-series. For this purpose, we propose a generalization of the Binet-Cauchy kernels to nonlinear dynamical systems (NLDS) whose output lives in a non-Euclidean space, e.g. the space of histograms. This can be achieved by using kernels defined on the original non-Euclidean space, leading to a well-defined metric for NLDSs. We use these kernels for the classification of actions in video sequences using (HOOF) as the output of the NLDS. We evaluate our approach to recognition of human actions in several scenarios and achieve encouraging results.

[1]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[2]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[3]  B. Moor,et al.  Subspace identification for linear systems , 1996 .

[4]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[5]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[6]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[7]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[10]  B. Moor,et al.  Subspace angles and distances between ARMA models , 2000 .

[11]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[12]  Stefano Soatto,et al.  Recognition of human gaits , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  Sung-Hyuk Cha,et al.  On measuring the distance between histograms , 2002, Pattern Recognit..

[14]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[16]  S. Lazebnik,et al.  Local Features and Kernels for Classification of Texture and Object Categories: An In-Depth Study , 2005 .

[17]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Nuno Vasconcelos,et al.  Probabilistic kernels for the classification of auto-regressive visual processes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[20]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[21]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[22]  Alexander J. Smola,et al.  Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes , 2007, International Journal of Computer Vision.

[23]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[24]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Anuj Srivastava,et al.  Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Stefano Soatto,et al.  Classification and Recognition of Dynamical Models: The Role of Phase, Independent Components, Kernels and Optimal Transport , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Nuno Vasconcelos,et al.  Classifying Video with Kernel Dynamic Textures , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[30]  David A. Forsyth,et al.  Searching for Complex Human Activities with No Visual Examples , 2008, International Journal of Computer Vision.

[31]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[32]  René Vidal,et al.  Recognition of Visual Dynamical Processes: Theory, Kernels, and Experimental Evaluation , 2009 .