What Objective Does Self-paced Learning Indeed Optimize?

Self-paced learning (SPL) is a recently raised methodology designed through simulating the learning principle of humans/animals. A variety of SPL realization schemes have been designed for different computer vision and pattern recognition tasks, and empirically substantiated to be effective in these applications. However, the investigation on its theoretical insight is still a blank. To this issue, this study attempts to provide some new theoretical understanding under the SPL scheme. Specifically, we prove that the solving strategy on SPL accords with a majorization minimization algorithm implemented on a latent objective function. Furthermore, we find that the loss function contained in this latent objective has a similar configuration with non-convex regularized penalty (NSPR) known in statistics and machine learning. Such connection inspires us discovering more intrinsic relationship between SPL regimes and NSPR forms, like SCAD, LOG and EXP. The robustness insight under SPL can then be finely explained. We also analyze the capability of SPL on its easy loss prior embedding property, and provide an insightful interpretation to the effectiveness mechanism under previous SPL variations. Besides, we design a group-partial-order loss prior, which is especially useful to weakly labeled large-scale data processing tasks. Through applying SPL with this loss prior to the FCVID dataset, which is currently one of the biggest manually annotated video dataset, our method achieves state-of-the-art performance beyond previous methods, which further helps supports the proposed theoretical arguments.

[1]  D. Cox The Regression Analysis of Binary Sequences , 2017 .

[2]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[3]  Changshui Zhang,et al.  Relaxed sparse eigenvalue conditions for sparse estimation via non-convex regularized regression , 2013, Pattern Recognit..

[4]  Daphne Koller,et al.  Learning specific-class segmentation from diverse data , 2011, 2011 International Conference on Computer Vision.

[5]  Bilge Mutlu,et al.  How Do Humans Teach: On Curriculum Learning and Teaching Dimension , 2011, NIPS.

[6]  Zhihua Zhang,et al.  On the Global Convergence of Majorization Minimization Algorithms for Nonconvex Optimization Problems , 2015, ArXiv.

[7]  Masashi Sugiyama,et al.  Outlier Path: A Homotopy Algorithm for Robust SVM , 2014, ICML.

[8]  J. Friedman Stochastic gradient boosting , 2002 .

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  S. Stigler Gauss and the Invention of Least Squares , 1981 .

[11]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[12]  Zhihua Zhang,et al.  Nonconvex Relaxation Approaches to Robust Matrix Recovery , 2013, IJCAI.

[13]  Valentin I. Spitkovsky,et al.  Baby Steps: How “Less is More” in Unsupervised Dependency Parsing , 2009 .

[14]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[15]  Qi Xie,et al.  Self-Paced Learning for Matrix Factorization , 2015, AAAI.

[16]  Sumit Basu,et al.  Teaching Classification Boundaries to Humans , 2013, AAAI.

[17]  Apostol Natsev,et al.  Efficient Large Scale Video Classification , 2015, ArXiv.

[18]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[19]  Deyu Meng,et al.  Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[21]  Antonio Torralba,et al.  Are all training examples equally valuable? , 2013, ArXiv.

[22]  Jieping Ye,et al.  A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems , 2013, ICML.

[23]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[24]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[25]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[26]  Zhihua Zhang,et al.  Nonconvex Penalization Using Laplace Exponents and Concave Conjugates , 2012, NIPS.

[27]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  F. Vaida PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS , 2005 .

[29]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[30]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[31]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[32]  Zhaoran Wang,et al.  OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS. , 2013, Annals of statistics.

[33]  Yunchao Wei,et al.  Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[35]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[36]  Charles X. Ling,et al.  Supervised Learning with Minimal Effort , 2010, PAKDD.

[37]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[38]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[39]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[40]  Chao Li,et al.  A Self-Paced Multiple-Instance Learning Framework for Co-Saliency Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).