The Pessimistic Limits and Possibilities of Margin-based Losses in Semi-supervised Learning

Consider a classification problem where we have both labeled and unlabeled data available. We show that for linear classifiers defined by convex margin-based surrogate losses that are decreasing, it is impossible to construct \emph{any} semi-supervised approach that is able to guarantee an improvement over the supervised classifier measured by this surrogate loss on the labeled and unlabeled data. For convex margin-based loss functions that also increase, we demonstrate safe improvements \emph{are} possible.

[1]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[2]  M. Sion On general minimax theorems , 1958 .

[3]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[4]  David Elworthy,et al.  Does Baum-Welch Re-estimation Help Taggers? , 1994, ANLP.

[5]  Marco Loog,et al.  Contrastive Pessimistic Likelihood Estimation for Semi-Supervised Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[7]  Zhi-Hua Zhou,et al.  Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Marco Loog,et al.  Constrained Parameter Estimation for Semi-supervised Learning: The Case of the Nearest Mean Classifier , 2010, ECML/PKDD.

[9]  Jun'ichi Takeuchi,et al.  Safe semi-supervised learning based on weighted likelihood , 2014, Neural Networks.

[10]  Jason Weston,et al.  Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.

[11]  Hans Ulrich Simon,et al.  Unlabeled Data Does Provably Help , 2013, STACS.

[12]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[13]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[14]  Marco Loog,et al.  Robust semi-supervised least squares classification by implicit constraints , 2015, Pattern Recognit..

[15]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[16]  M. Seeger Learning with labeled and unlabeled dataMatthias , 2001 .

[17]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[18]  Bing Zhang,et al.  Semi-supervised learning improves gene expression-based prediction of cancer recurrence , 2011, Bioinform..

[19]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[20]  Marco Loog,et al.  Projected estimators for robust semi-supervised classification , 2016, Machine Learning.

[21]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[22]  T. Poggio,et al.  Regularized Least-Squares Classification 133 In practice , although , 2007 .

[23]  Nataliya Sokolovska,et al.  The asymptotics of semi-supervised learning in discriminative probabilistic models , 2008, ICML '08.

[24]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.