Laplacian LRR on Product Grassmann Manifolds for Human Activity Clustering in Multicamera Video Surveillance

In multicamera video surveillance, it is challenging to represent videos from different cameras properly and fuse them efficiently for specific applications such as human activity recognition and clustering. In this paper, a novel representation for multicamera video data, namely, the product Grassmann manifold (PGM), is proposed to model video sequences as points on the Grassmann manifold and integrate them as a whole in the product manifold form. In addition, with a new geometry metric on the product manifold, the conventional low rank representation (LRR) model is extended onto PGM and the new LRR model can be used for clustering nonlinear data, such as multicamera video data. To evaluate the proposed method, a number of clustering experiments are conducted on several multicamera video data sets of human activity, including the Dongzhimen Transport Hub Crowd action data set, the ACT 42 Human Action data set, and the SKIG action data set. The experiment results show that the proposed method outperforms many state-of-the-art clustering methods.

[1]  René Vidal,et al.  A closed form solution to robust subspace estimation and clustering , 2011, CVPR 2011.

[2]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[3]  Shengping Zhang,et al.  Sparse coding based visual tracking: Review and experimental comparison , 2013, Pattern Recognit..

[4]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[5]  Pierre Vandergheynst,et al.  Distributed Coding of Highly Correlated Image Sequences with Motion-Compensated Temporal Wavelets , 2006, EURASIP J. Adv. Signal Process..

[6]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[7]  G. Lerman,et al.  Robust recovery of multiple subspaces by geometric l_p minimization , 2011, 1104.3770.

[8]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[9]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[10]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[11]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[12]  Junbin Gao,et al.  Product Grassmann Manifold Representation and Its LRR Models , 2016, AAAI.

[13]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[14]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[15]  Brian C. Lovell,et al.  Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Mubarak Shah,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Hassan Foroosh,et al.  View-invariant action recognition using fundamental ratios , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Shengping Zhang,et al.  Robust visual tracking based on online learning sparse representation , 2013, Neurocomputing.

[20]  Jun Zhang,et al.  Adaptive NormalHedge for robust visual tracking , 2015, Signal Process..

[21]  Aline Roumy,et al.  CHAPTER 6 – Toward Constructive Slepian–Wolf Coding Schemes , 2009 .

[22]  Shengping Zhang,et al.  Action recognition based on overcomplete independent components analysis , 2014, Inf. Sci..

[23]  René Vidal,et al.  Latent Space Sparse Subspace Clustering , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[25]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xuelong Li,et al.  Robust Visual Tracking Using Structurally Random Projection and Weighted Least Squares , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[30]  Pier Luigi Dragotti,et al.  Distributed Compression of Multi-View Images using a Geometrical Coding Approach , 2007, 2007 IEEE International Conference on Image Processing.

[31]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[32]  Huchuan Lu,et al.  Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..

[33]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Bernd Girod,et al.  Distributed Video Coding , 2005, Proceedings of the IEEE.

[35]  Larry S. Davis,et al.  Multi-camera Tracking and Segmentation of Occluded People on Ground Plane Using Search-Guided Particle Filtering , 2006, ECCV.

[36]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[37]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[38]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[39]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[40]  Junbin Gao,et al.  Low Rank Representation on Grassmann Manifolds , 2014, ACCV.

[41]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[42]  Justus H. Piater,et al.  Multi-camera People Tracking by Collaborative Particle Filters and Principal Axis-Based Integration , 2007, ACCV.

[43]  U. Helmke,et al.  Newton's method on Gra{\ss}mann manifolds , 2007, 0709.2205.

[44]  Rama Chellappa,et al.  Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Shengping Zhang,et al.  Robust Visual Tracking Using an Effective Appearance Model Based on Sparse Coding , 2012, TIST.