Pairwise Kernels for Human Interaction Recognition

In this paper we model binary people interactions by forming temporal interaction trajectories, under the form of a time series, coupling together the body motion of each individual as well as their proximity relationships. Such trajectories are modeled with a non-linear dynamical system (NLDS). We develop a framework that entails the use of so-called pairwise kernels, able to compare interaction trajectories in the space of NLDS. To do so we address the problem of modeling the Riemannian structure of the trajectory space, and we also prove that kernels have to satisfy certain symmetry properties, which are peculiar of this interaction modeling framework. Experiment results show that this approach is quite promising, as it is able to match and improve state-of-the-art classification and retrieval accuracies on two human interaction datasets.

[1]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[2]  Nuno Vasconcelos,et al.  Probabilistic kernels for the classification of auto-regressive visual processes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Yunde Jia,et al.  Learning Human Interaction by Interactive Phrases , 2012, ECCV.

[4]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  RamananDeva,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2013 .

[6]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Gang Yu,et al.  Propagative Hough Voting for Human Activity Recognition , 2012, ECCV.

[10]  Andreas Fischer,et al.  Pairwise support vector machines and their application to large scale problems , 2012, J. Mach. Learn. Res..

[11]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[12]  B. Moor,et al.  Subspace angles and distances between ARMA models , 2000 .

[13]  Anuj Srivastava,et al.  Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[15]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[16]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[17]  Nuno Vasconcelos,et al.  Recognizing Activities by Attribute Dynamics , 2012, NIPS.

[18]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Peter H. Tu,et al.  Appearance-based person reidentification in camera networks: problem overview and current approaches , 2011, J. Ambient Intell. Humaniz. Comput..

[20]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[21]  Alexander J. Smola,et al.  Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes , 2007, International Journal of Computer Vision.

[22]  Amit K. Roy-Chowdhury,et al.  A “string of feature graphs” model for recognition of complex activities in natural videos , 2011, 2011 International Conference on Computer Vision.

[23]  Nuno Vasconcelos,et al.  Classifying Video with Kernel Dynamic Textures , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  William Brendel,et al.  Learning spatiotemporal graphs of human activities , 2011, 2011 International Conference on Computer Vision.

[25]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[26]  Lennart Ljung,et al.  System identification (2nd ed.): theory for the user , 1999 .

[27]  R. Vidal,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ian D. Reid,et al.  Structured Learning of Human Interactions in TV Shows , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .