Compact representation for large-scale unconstrained video analysis

Recently, newly invented features (e.g. Fisher vector, VLAD) have achieved state-of-the-art performance in large-scale video analysis systems that aims to understand the contents in videos, such as concept recognition and event detection. However, these features are in high-dimensional representations, which remarkably increases computation costs and correspondingly deteriorates the performance of subsequent learning tasks. Notably, the situation becomes even worse when dealing with large-scale video data where the number of class labels are limited. To address this problem, we propose a novel algorithm to compactly represent huge amounts of unconstrained video data. Specifically, redundant feature dimensions are removed by using our proposed feature selection algorithm. Considering unlabeled videos that are easy to obtain on the web, we apply this feature selection algorithm in a semi-supervised framework coping with a shortage of class information. Different from most of the existing semi-supervised feature selection algorithms, our proposed algorithm does not rely on manifold approximation, i.e. graph Laplacian, which is quite expensive for a large number of data. Thus, it is possible to apply the proposed algorithm to a real large-scale video analysis system. Besides, due to the difficulty of solving the non-smooth objective function, we develop an efficient iterative approach to seeking the global optimum. Extensive experiments are conducted on several real-world video datasets, including KTH, CCV, and HMDB. The experimental results have demonstrated the effectiveness of the proposed algorithm.

[1]  Jian Sun,et al.  Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[3]  Yi Yang,et al.  Semi-Supervised Multiple Feature Analysis for Action Recognition , 2014, IEEE Transactions on Multimedia.

[4]  Martha White,et al.  Convex Sparse Coding, Subspace Learning, and Semi-Supervised Extensions , 2011, AAAI.

[5]  Yueting Zhuang,et al.  Sparse Unsupervised Dimensionality Reduction for Multiple View Data , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[8]  Chris H. Q. Ding,et al.  Multi-label ReliefF and F-statistic feature selections for image annotation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[10]  Blai Bonet,et al.  Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA , 2015, AAAI.

[11]  Jian Zhang,et al.  Graph-based clustering and ranking for diversified image search , 2014, Multimedia Systems.

[12]  Yi Yang,et al.  Co-Regularized Ensemble for Feature Selection , 2013, IJCAI.

[13]  Andrew Zisserman,et al.  Sparse kernel approximations for efficient classification and detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[15]  Chris H. Q. Ding,et al.  Efficient Algorithms for Selecting Features with Arbitrary Group Constraints via Group Lasso , 2013, 2013 IEEE 13th International Conference on Data Mining.

[16]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[18]  Yahong Han,et al.  Discriminative Multi-Task Feature Selection , 2013, AAAI.

[19]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[20]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[21]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[22]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[24]  Yi Yang,et al.  How Related Exemplars Help Complex Event Detection in Web Videos? , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[26]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Jianfei Cai,et al.  Compact Representation for Image Classification: To Choose or to Compress? , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Joseph Y. Halpern,et al.  Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence , 2014, AAAI 2014.

[29]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Yi Yang,et al.  A Convex Formulation for Spectral Shrunk Clustering , 2015, AAAI.

[31]  Yaoliang Yu,et al.  Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations , 2012, ICML.

[32]  Feiping Nie,et al.  Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK) , 2014, ECML/PKDD.

[33]  Huanhuan Chen,et al.  Semisupervised Classification With Cluster Regularization , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[35]  Nicu Sebe,et al.  Complex Event Detection via Multi-source Video Attributes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Sen Wang,et al.  Semi-supervised Feature Analysis for Multimedia Annotation by Mining Label Correlation , 2014, PAKDD.

[37]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[39]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[40]  Nicu Sebe,et al.  Harnessing Lab Knowledge for Real-World Action Recognition , 2014, International Journal of Computer Vision.

[41]  Yi Yang,et al.  Image Clustering Using Local Discriminant Models and Global Integration , 2010, IEEE Transactions on Image Processing.

[42]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[43]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.

[44]  Weitong Chen,et al.  Multi-task support vector machines for feature selection with shared knowledge discovery , 2016, Signal Process..

[45]  David G. Stork,et al.  Pattern Classification , 1973 .

[46]  Nicu Sebe,et al.  Discriminating Joint Feature Analysis for Multimedia Data Understanding , 2012, IEEE Transactions on Multimedia.

[47]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[48]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.