Online geometric human interaction segmentation and recognition

We address the problem of online temporal segmentation and recognition of human interactions in video sequences. The complexity of the high-dimensional data variability representing interactions is handled by combining kernel methods with linear models, giving rise to kernel regression and kernel state space models. By exploiting the geometry of linear operators in Hilbert space, we show how the concept of parity space, defined for linear models, generalizes to the kernellized extensions. This provides a powerful and flexible framework for online temporal segmentation and recognition. We extensively evaluate the approach on a publicly available dataset, and on a new challenging human interactions dataset that we have collected. The results show that the approach holds the promise to become an effective building block for the analysis in real-time of human behavior.

[1]  Zaïd Harchaoui,et al.  Kernel Change-point Analysis , 2008, NIPS.

[2]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[3]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Nuno Vasconcelos,et al.  Recognizing Activities by Attribute Dynamics , 2012, NIPS.

[5]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[6]  Nuno Vasconcelos,et al.  Classifying Video with Kernel Dynamic Textures , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Johan A. K. Suykens,et al.  Efficiently updating and tracking the dominant kernel principal components , 2007, Neural Networks.

[10]  Zaïd Harchaoui,et al.  Signal Processing , 2013, 2020 27th International Conference on Mixed Design of Integrated Circuits and System (MIXDES).

[11]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[13]  Yang Wang,et al.  A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[14]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Ian D. Reid,et al.  Structured Learning of Human Interactions in TV Shows , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jing Xiao,et al.  Substructure and boundary modeling for continuous action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Martial Hebert,et al.  Modeling the Temporal Extent of Actions , 2010, ECCV.

[19]  R. Vidal,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[21]  M. Irani,et al.  Event-Based Video Analysis, , 2001 .

[22]  Jernej Barbic,et al.  Segmenting Motion Capture Data into Distinct Behaviors , 2004, Graphics Interface.

[23]  Jake K. Aggarwal,et al.  Semantic-level Understanding of Human Actions and Interactions using Event Hierarchy , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[24]  Peter H. Tu,et al.  A model change detection approach to dynamic scene modeling , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[25]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[26]  Patrick Bouthemy,et al.  Content-Based Video Segmentation using Statistical Motion Models , 2002, BMVC.

[27]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[28]  Sangho Park,et al.  Recognition of two-person interactions using a hierarchical Bayesian network , 2003, IWVS '03.

[29]  Ryan P. Adams,et al.  Bayesian Online Changepoint Detection , 2007, 0710.3742.

[30]  Carl E. Rasmussen,et al.  Gaussian Process Change Point Models , 2010, ICML.

[31]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[32]  Jessica K. Hodgins,et al.  Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  O. Cappé,et al.  Retrospective Mutiple Change-Point Estimation with Kernels , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[34]  A. Willsky,et al.  Analytical redundancy and the design of robust failure detection systems , 1984 .

[35]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[36]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[37]  Arjun K. Gupta,et al.  Parametric Statistical Change Point Analysis , 2000 .

[38]  Gang Yu,et al.  Propagative Hough Voting for Human Activity Recognition , 2012, ECCV.

[39]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[40]  Yunde Jia,et al.  Learning Human Interaction by Interactive Phrases , 2012, ECCV.

[41]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Gérard G. Medioni,et al.  Kernelized Temporal Cut for Online Temporal Segmentation and Recognition , 2012, ECCV.

[44]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[45]  Fernando De la Torre,et al.  Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Saeid Motiian,et al.  Pairwise Kernels for Human Interaction Recognition , 2013, ISVC.

[47]  Fernando De la Torre,et al.  Temporal Segmentation of Facial Behavior , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[48]  Alexander J. Smola,et al.  Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes , 2007, International Journal of Computer Vision.

[49]  Kevin P. Murphy,et al.  Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[50]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[51]  Paul Honeine,et al.  Online Kernel Principal Component Analysis: A Reduced-Order Model , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Andreas Fischer,et al.  Pairwise support vector machines and their application to large scale problems , 2012, J. Mach. Learn. Res..