Joint specific and correlated information exploration for multi-view action clustering

Abstract Human action clustering is crucial in many practical applications. However, existing action clustering methods work in a single-view manner, which always ignore the relationships among different views and fail to discover correct clusters as the viewpoint and position change. To address the challenges, we propose a unified framework for multi-view human action clustering. First, we design a new Bag-of-Shared-Words (BoSW) model to discover the view-shared visual words that preserve the consistency among visual words of different views. Then, we obtain a more discriminative feature representation, from which the view correlation can be fully explored. Then, we present a novel JOint INformation boTtleneck (JOINT) algorithm to jointly exploit both the view-specific and view-correlated information to improve the action clustering performance. Specifically, JOINT formulates the problem as minimizing an information loss function, which compresses the actions of each view while jointly preserving the complementary view-specific information and correlated information among views. To solve the proposed objective function, a new sequential procedure is presented to guarantee convergence to a local optimal solution. Extensive experiments on three challenging multi-view single-person and interactive action datasets demonstrate the superiority of our algorithm.

[1]  Xuelong Li,et al.  Auto-Weighted Multi-View Learning for Image Clustering and Semi-Supervised Classification , 2018, IEEE Transactions on Image Processing.

[2]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[3]  C. V. Jawahar,et al.  Unsupervised Learning of Deep Feature Representation for Clustering Egocentric Actions , 2017, IJCAI.

[4]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[5]  Dacheng Tao,et al.  Large-Margin Multi-ViewInformation Bottleneck , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Asok Ray,et al.  Multimodal Task-Driven Dictionary Learning for Image Classification , 2015, IEEE Transactions on Image Processing.

[7]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Dong Yue,et al.  Multi-view low-rank dictionary learning for image classification , 2016, Pattern Recognit..

[9]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[11]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Xiao-Yuan Jing,et al.  Uncorrelated Multi-View Discrimination Dictionary Learning for Recognition , 2014, AAAI.

[14]  Rama Chellappa,et al.  Cross-View Action Recognition via Transferable Dictionary Learning , 2016, IEEE Transactions on Image Processing.

[15]  Chang-Dong Wang,et al.  Multi-view Intact Space Clustering , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[16]  Hamido Fujita,et al.  A study of graph-based system for multi-view clustering , 2019, Knowl. Based Syst..

[17]  Fakhri Karray,et al.  Multiview Supervised Dictionary Learning in Speech Emotion Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Zhenwen Ren,et al.  Robust low-rank kernel multi-view subspace clustering based on the Schatten p-norm and correntropy , 2019, Inf. Sci..

[19]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Dong Xu,et al.  Dividing and Aggregating Network for Multi-view Action Recognition , 2018, ECCV.

[21]  Longbing Cao,et al.  Multi-view Information-theoretic Co-clustering for Co-occurrence Data , 2019, AAAI.

[22]  Bing Li,et al.  Graph Based Skeleton Motion Representation and Similarity Measurement for Action Recognition , 2016, ECCV.

[23]  Yangdong Ye,et al.  Multi-task Clustering of Human Actions by Sharing Information , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Hao Wang,et al.  Multi-view Clustering via Concept Factorization with Local Manifold Regularization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[25]  Guolong Chen,et al.  Human action recognition via multi-task learning base on spatial-temporal feature , 2015, Inf. Sci..

[26]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[28]  Ling Shao,et al.  Spatio-Temporal Laplacian Pyramid Coding for Action Recognition , 2014, IEEE Transactions on Cybernetics.

[29]  René Vidal,et al.  Moving Poselets: A Discriminative and Interpretable Skeletal Motion Representation for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[30]  Subramanian Ramanathan,et al.  Multitask Linear Discriminant Analysis for View Invariant Action Recognition , 2014, IEEE Transactions on Image Processing.

[31]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[32]  Yuhong Guo,et al.  Convex Subspace Representation Learning from Multi-View Data , 2013, AAAI.

[33]  Yu-Ting Su,et al.  Single/multi-view human action recognition via regularized multi-task learning , 2015, Neurocomputing.

[34]  Thomas B. Moeslund,et al.  A Local 3-D Motion Descriptor for Multi-View Human Action Recognition from 4-D Spatio-Temporal Interest Points , 2012, IEEE Journal of Selected Topics in Signal Processing.

[35]  Ling Shao,et al.  Unsupervised Spectral Dual Assignment Clustering of Human Actions in Context , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yan Gao,et al.  The Multi-view Information Bottleneck Clustering , 2007, DASFAA.

[37]  Xinyu Zhang,et al.  Multi-view clustering based on graph-regularized nonnegative matrix factorization for object recognition , 2017, Inf. Sci..

[38]  Lei Du,et al.  Robust Multi-View Spectral Clustering via Low-Rank and Sparse Decomposition , 2014, AAAI.

[39]  Feiping Nie,et al.  Large-Scale Multi-View Spectral Clustering via Bipartite Graph , 2015, AAAI.

[40]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[41]  Mohan S. Kankanhalli,et al.  Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition , 2017, IEEE Transactions on Cybernetics.

[42]  Feiping Nie,et al.  Fast Robust Non-Negative Matrix Factorization for Large-Scale Human Action Data Clustering , 2016, IJCAI.

[43]  Yuting Su,et al.  Multiple/Single-View Human Action Recognition via Part-Induced Multitask Structural Learning , 2015, IEEE Transactions on Cybernetics.

[44]  Jun Li,et al.  Deeply Learned View-Invariant Features for Cross-View Action Recognition , 2017, IEEE Transactions on Image Processing.

[45]  Zenglin Xu,et al.  Auto-weighted multi-view clustering via kernelized graph learning , 2019, Pattern Recognit..

[46]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[47]  Jun Gao,et al.  Learning universal multiview dictionary for human action recognition , 2017, Pattern Recognit..

[48]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[49]  Vinodkrishnan Kulathumani,et al.  Real-time multi-view human action recognition using a wireless camera network , 2011, 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras.