论文信息 - Tensor rank selection for multimedia analysis

Tensor rank selection for multimedia analysis

We propose a novel tensor BOW model which can represent spatial structure information of multimedia.We propose a new tensor-based framework which can effectively reveal the discriminative knowledge along each order of tensor.The rank of tensor representation can be selected automatically.Two types of vector-based algorithms are extended to their tensor counterparts.We compare the proposed algorithms with state-of-the-art methods on three multimedia applications. Tensors representations are widely used in multimedia applications. As a key step of tensor processing, the rank-1 tensor decomposition (i.e., the CANDECOMP/PARAFAC (CP) decomposition) always requires the estimation of the tensor rank. The ? 2 , 1 -norm has been shown to be effective for tensor rank selection. The existing tensor rank selection algorithm force the same columns of the tensor matrices to simultaneously become zero. However, the real sparse columns for different factor matrices may be different. Such strategy does not really uncover the sparse information of each factor matrix. In this paper, we add a separable ? 2 , 1 -norm on multiple factor matrices to obtain real sparse results along to different modes. And then different sparse results are assembled into a joint sparse pattern for tensor rank selection. This added separable regularization term has twofold role in enhancing the effect of regularization for each factor matrix and fully utilizing the knowledge of multiple factor matrices to facilitate decision making. In order to effectively exploit the structure information of multimedia data, we propose a model of tensor bag of words (tBOW) as the direct input of our algorithms. In the experiments, we apply the proposed algorithms to three representative tasks of multimedia analysis, i.e., image classification, video action recognition, and head pose estimation. Experimental results on three open benchmark datasets show that our algorithms are effective to multimedia analysis.

Jianmin Jiang | Yahong Han | Jianguang Zhang

[1] Ivor W. Tsang,et al. Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[3] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[4] Alexander G. Hauptmann,et al. MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[5] Gerald Sommer,et al. Signal modeling for two-dimensional image structures , 2007, J. Vis. Commun. Image Represent..

[6] Larry S. Davis,et al. Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Masashi Sugiyama,et al. Recent Advances and Trends in Large-Scale Kernel Methods , 2009, IEICE Trans. Inf. Syst..

[8] Stan Sclaroff,et al. ClassMap: Efficient Multiclass Recognition via Embeddings , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9] Adrien Bartoli,et al. Semantic Shape Context for the Registration of Multiple Partial 3D Views , 2009, BMVC.

[10] Weiwei Guo,et al. Tensor Learning for Regression , 2012, IEEE Transactions on Image Processing.

[11] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..

[12] Michael Werman,et al. A Linear Time Histogram Metric for Improved SIFT Matching , 2008, ECCV.

[13] Jiawei Han,et al. Subspace Learning Based on Tensor Analysis , 2005 .

[14] Paul J. Besl,et al. A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Lei Chen,et al. Structure Tensor Series-Based Large Scale Near-Duplicate Video Retrieval , 2012, IEEE Transactions on Multimedia.

[16] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Nicu Sebe,et al. Semi-Supervised Face Detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[18] Cordelia Schmid,et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[19] Dong Xu,et al. Multilinear Discriminant Analysis for Face Recognition , 2007, IEEE Transactions on Image Processing.

[20] Yun Fu,et al. Head pose estimation: Classification or regression? , 2008, 2008 19th International Conference on Pattern Recognition.

[21] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[22] David Dagan Feng,et al. Discriminative two-level feature selection for realistic human action recognition , 2013, J. Vis. Commun. Image Represent..

[23] Hongbin Zha,et al. Structure-Sensitive Superpixels via Geodesic Distance , 2011, 2011 International Conference on Computer Vision.

[24] Sungzoon Cho,et al. epsilon-Tube Based Pattern Selection for Support Vector Machines , 2006, PAKDD.

[25] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Zi Huang,et al. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[27] Nicu Sebe,et al. Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] J. Crowley,et al. Estimating Face orientation from Robust Detection of Salient Facial Structures , 2004 .

[29] Feiping Nie,et al. Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[30] Jian Zhang,et al. Fast human action classification and VOI localization with enhanced sparse coding , 2013, J. Vis. Commun. Image Represent..

[31] Cordelia Schmid,et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[32] D. Basak,et al. Support Vector Regression , 2008 .

[33] Hui Cheng,et al. Evaluation of low-level features and their combinations for complex event detection in open source videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Cordelia Schmid,et al. Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.