Recognizing Actions across Cameras by Exploring the Correlated Subspace

We present a novel transfer learning approach to cross-camera action recognition. Inspired by canonical correlation analysis (CCA), we first extract the spatio-temporal visual words from videos captured at different views, and derive a correlation subspace as a joint representation for different bag-of-words models at different views. Different from prior CCA-based approaches which simply train standard classifiers such as SVM in the resulting subspace, we explore the domain transfer ability of CCA in the correlation subspace, in which each dimension has a different capability in correlating source and target data. In our work, we propose a novel SVM with a correlation regularizer which incorporates such ability into the design of the SVM. Experiments on the IXMAS dataset verify the effectiveness of our method, which is shown to outperform state-of-the-art transfer learning approaches without taking such domain transfer ability into consideration.

[1]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[3]  John Blitzer,et al.  Domain Adaptation with Coupled Subspaces , 2011, AISTATS.

[4]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[5]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[7]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[8]  Mohan M. Trivedi,et al.  Human action recognition using multiple views: a comparative perspective on recent developments , 2011, J-HGBU '11.

[9]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[10]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[11]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[12]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Mohan M. Trivedi,et al.  Human body modelling and tracking using volumetric representation: Selected recent studies and possibilities for extensions , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[14]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[16]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[17]  Iqbal Gondal,et al.  On dynamic scene geometry for view-invariant action matching , 2011, CVPR 2011.