Semisupervised Discriminant Multimanifold Analysis for Action Recognition

Although recent semisupervised approaches have proven their effectiveness when there are limited training data, they assume that the samples from different actions lie on a single data manifold in the feature space and try to uncover a common subspace for all samples. However, this assumption ignores the intraclass compactness and the interclass separability simultaneously. We believe that human actions should occupy multimanifold subspace and, therefore, model the samples of the same action as the same manifold and those of different actions as different manifolds. In order to obtain the optimum subspace projection matrix, the current approaches may be mathematically imprecise owe to the badly scaled matrix and improper convergence. To address these issues in unconstrained convex optimization, we introduce a nontrivial spectral projected gradient method and Karush–Kuhn–Tucker conditions without matrix inversion. Through maximizing the separability between different classes by using labeled data points and estimating the intrinsic geometric structure of the data distributions by exploring unlabeled data points, the proposed algorithm can learn global and local consistency and boost the recognition performance. Extensive experiments conducted on the realistic video data sets, including JHMDB, HMDB51, UCF50, and UCF101, have demonstrated that our algorithm outperforms the compared algorithms, including deep learning approach when there are only a few labeled samples.

[1]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ling Shao,et al.  Feature Learning for Image Classification Via Multiobjective Genetic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  José Mario Martínez,et al.  Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[4]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[5]  Ruimin Hu,et al.  Graph discriminant analysis on multi-manifold (GDAMM): A novel super-resolution method for face recognition , 2012, 2012 19th IEEE International Conference on Image Processing.

[6]  Liang Chen,et al.  Coupled Discriminant Multi-Manifold Analysis with Application to Low-Resolution Face Recognition , 2015, MMM.

[7]  Chunhui Zhao,et al.  Sparse Exponential Discriminant Analysis and Its Application to Fault Diagnosis , 2018, IEEE Transactions on Industrial Electronics.

[8]  Rui Hou,et al.  Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Yi Yang,et al.  Action recognition by exploring data distribution and feature correlation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Junjun Jiang,et al.  Noise Robust Face Image Super-Resolution Through Smooth Sparse Representation , 2017, IEEE Transactions on Cybernetics.

[11]  Ruimin Hu,et al.  CDMMA: Coupled discriminant multi-manifold analysis for matching low-resolution face images , 2016, Signal Process..

[12]  Abderrahman Bouhamidi,et al.  Convex constrained optimization for large-scale generalized Sylvester equations , 2011, Comput. Optim. Appl..

[13]  Mohan M. Trivedi,et al.  IEEE Transactions on Intelligent Vehicles Senior Associate Editors , 2016, IEEE Trans. Intell. Veh..

[14]  Zhen-yun Peng,et al.  A hybrid algorithm for solving minimization problem over (R,S)-symmetric matrices with the matrix inequality constraint , 2015 .

[15]  Deng Cai,et al.  Manifold Adaptive Experimental Design for Text Categorization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[16]  Bhaskara Marthi,et al.  A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs , 2017, Science.

[17]  Horst Bischof,et al.  Dense reconstruction on-the-fly , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Qing Ling,et al.  Robust Temporal-Spatial Decomposition and Its Applications in Video Processing , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Ruimin Hu,et al.  Noise Robust Face Hallucination via Locality-Constrained Representation , 2014, IEEE Transactions on Multimedia.

[20]  Ling Shao,et al.  Kernelized Multiview Projection for Robust Action Recognition , 2016, International Journal of Computer Vision.

[21]  Chun Chen,et al.  Relational Multimanifold Coclustering , 2013, IEEE Transactions on Cybernetics.

[22]  Subramanian Ramanathan,et al.  Multitask Linear Discriminant Analysis for View Invariant Action Recognition , 2014, IEEE Transactions on Image Processing.

[23]  Zhang Yi,et al.  Connections Between Nuclear-Norm and Frobenius-Norm-Based Representations , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Zhongyuan Wang,et al.  SeaShips: A Large-Scale Precisely Annotated Dataset for Ship Detection , 2018, IEEE Transactions on Multimedia.

[25]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Yansheng Li,et al.  Feature guided Gaussian mixture model with semi-supervised EM and local geometric constraint for retinal image registration , 2017, Inf. Sci..

[27]  Ji Zhao,et al.  Non-rigid visible and infrared face registration via regularized Gaussian fields criterion , 2015, Pattern Recognit..

[28]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[29]  Zhongyuan Wang,et al.  Smart Monitoring Cameras Driven Intelligent Processing to Big Surveillance Video Data , 2018, IEEE Transactions on Big Data.

[30]  David Zhang,et al.  Gradient Histogram Estimation and Preservation for Texture Enhanced Image Denoising , 2014, IEEE Transactions on Image Processing.

[31]  Cordelia Schmid,et al.  A Robust and Efficient Video Representation for Action Recognition , 2015, International Journal of Computer Vision.

[32]  Glenn Fung,et al.  Multicategory Proximal Support Vector Machine Classifiers , 2005, Machine Learning.

[33]  Ling Shao,et al.  Transfer Learning for Visual Categorization: A Survey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[35]  Thomas S. Huang,et al.  Semisupervised Hyperspectral Classification Using Task-Driven Dictionary Learning With Laplacian Regularization , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[36]  Yang Yang,et al.  Multitask Spectral Clustering by Exploring Intertask Correlation , 2015, IEEE Transactions on Cybernetics.

[37]  Quan Z. Sheng,et al.  Nonrigid Point Set Registration With Robust Transformation Learning Under Manifold Regularization , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Qinghua Zheng,et al.  An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition , 2018, IEEE Transactions on Cybernetics.

[39]  Markus Flierl,et al.  Graph-Preserving Sparse Nonnegative Matrix Factorization With Application to Facial Expression Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[40]  Nicu Sebe,et al.  Harnessing Lab Knowledge for Real-World Action Recognition , 2014, International Journal of Computer Vision.

[41]  Xiaojun Chang,et al.  Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[43]  Thomas S. Huang,et al.  Studying Very Low Resolution Recognition Using Deep Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Shin'ichi Satoh,et al.  Person Reidentification via Discrepancy Matrix and Matrix Metric , 2018, IEEE Transactions on Cybernetics.

[45]  Ling Shao,et al.  Structure-Preserving Binary Representations for RGB-D Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Yi Yang,et al.  Semi-Supervised Multiple Feature Analysis for Action Recognition , 2014, IEEE Transactions on Multimedia.

[47]  Ivor W. Tsang,et al.  Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction , 2010, IEEE Transactions on Image Processing.

[48]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[49]  Brian C. Lovell,et al.  Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching , 2011, CVPR 2011.

[50]  Yi Yang,et al.  Web and Personal Image Annotation by Mining Label Correlation With Relaxed Visual Graph Embedding , 2012, IEEE Transactions on Image Processing.

[51]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[52]  Chenliang Xu,et al.  Weakly Supervised Actor-Action Segmentation via Robust Multi-task Ranking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Huafeng Chen,et al.  Action recognition by saliency-based dense sampling , 2017, Neurocomputing.

[54]  Wei-Yun Yau,et al.  Semi-supervised subspace learning with L2graph , 2016, Neurocomputing.

[55]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[56]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[57]  Zheng Wang,et al.  Zero-Shot Person Re-identification via Cross-View Consistency , 2016, IEEE Transactions on Multimedia.

[58]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Yi Yang,et al.  Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization , 2015, International Journal of Computer Vision.

[60]  Yuan Yan Tang,et al.  Quaternionic Weber Local Descriptor of Color Images , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[61]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[62]  Bingbing Ni,et al.  Zero-Shot Action Recognition with Error-Correcting Output Codes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Alan L. Yuille,et al.  Semi-Supervised Sparse Representation Based Classification for Face Recognition With Insufficient Labeled Samples , 2016, IEEE Transactions on Image Processing.

[64]  Ruiping Wang,et al.  Manifold Discriminant Analysis , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[66]  Chunhui Zhao,et al.  Online Fault Diagnosis in Industrial Processes Using Multimodel Exponential Discriminant Analysis Algorithm , 2019, IEEE Transactions on Control Systems Technology.

[67]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Zhang Yi,et al.  Scalable Sparse Subspace Clustering , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Lei Zhang,et al.  Discriminative learning of iteration-wise priors for blind deconvolution , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Junjun Jiang,et al.  FusionGAN: A generative adversarial network for infrared and visible image fusion , 2019, Inf. Fusion.

[71]  Mubarak Shah,et al.  Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[72]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.