Learning to Detect Concepts from Webly-Labeled Video Data

Learning detectors that can recognize concepts, such as people actions, objects, etc., in video content is an interesting but challenging problem. In this paper, we study the problem of automatically learning detectors from the big video data on the web without any additional manual annotations. The contextual information available on the web provides noisy labels to the video content. To leverage the noisy web labels, we propose a novel method called WEbly-Labeled Learning (WELL). It is established on two theories called curriculum learning and self-paced learning and exhibits useful properties that can be theoretically verified. We provide compelling insights on the latent non-convex robust loss that is being minimized on the noisy data. In addition, we propose two novel techniques that not only enable WELL to be applied to big data but also lead to more accurate results. The efficacy and the scalability of WELL have been extensively demonstrated on two public benchmarks, including the largest multimedia dataset and the largest manually-labeled video set. Experimental results show that WELL significantly outperforms the state-of-the-art methods. To the best of our knowledge, WELL achieves by far the best reported performance on these two webly-labeled big video datasets.

[1]  Xinlei Chen,et al.  Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Kathrin Klamroth,et al.  Biconvex sets and optimization with biconvex functions: a survey and extensions , 2007, Math. Methods Oper. Res..

[3]  Deyu Meng,et al.  What Objective Does Self-paced Learning Indeed Optimize? , 2015, ArXiv.

[4]  Apostol Natsev,et al.  Efficient Large Scale Video Classification , 2015, ArXiv.

[5]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[6]  J. Friedman Stochastic gradient boosting , 2002 .

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[9]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[10]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[11]  Jieping Ye,et al.  A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems , 2013, ICML.

[12]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[13]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[15]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[16]  Deyu Meng,et al.  Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos , 2015, ICMR.

[17]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[18]  ScienceDirect Computational statistics & data analysis , 1983 .

[19]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[20]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Daphne Koller,et al.  Learning specific-class segmentation from diverse data , 2011, 2011 International Conference on Computer Vision.

[22]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Qi Xie,et al.  Self-Paced Learning for Matrix Factorization , 2015, AAAI.

[25]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[26]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[27]  Yunchao Wei,et al.  Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Valentin I. Spitkovsky,et al.  Baby Steps: How “Less is More” in Unsupervised Dependency Parsing , 2009 .

[29]  Yi Yang,et al.  Fast and Accurate Content-based Semantic Search in 100M Internet Videos , 2015, ACM Multimedia.

[30]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[31]  Julien Mairal,et al.  Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization , 2013, NIPS.

[32]  Shih-Fu Chang,et al.  Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[34]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[35]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[36]  Zhaohui Zheng,et al.  Stochastic gradient boosted distributed decision trees , 2009, CIKM.

[37]  Lorenzo Torresani,et al.  Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.