Semi‐supervised Learning Based on Distributionally Robust Optimization

We propose a novel method for semi-supervised learning (SSL) based on data-driven distributionally robust optimization (DRO) using optimal transport metrics. Our proposed method enhances generalization error by using the unlabeled data to restrict the support of the worst case distribution in our DRO formulation. We enable the implementation of our DRO formulation by proposing a stochastic gradient descent algorithm which allows to easily implement the training procedure. We demonstrate that our Semi-supervised DRO method is able to improve the generalization error over natural supervised procedures and state-of-the-art SSL estimators. Finally, we include a discussion on the large sample behavior of the optimal uncertainty region in the DRO formulation. Our discussion exposes important aspects such as the role of dimension reduction in SSL.

[1]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[2]  Yoav Freund,et al.  Scalable Semi-Supervised Aggregation of Classifiers , 2015, NIPS.

[3]  Peter W. Glynn,et al.  Unbiased Estimation with Square Root Convergence for SDE Models , 2015, Oper. Res..

[4]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[5]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[6]  Marco Loog,et al.  Contrastive Pessimistic Likelihood Estimation for Semi-Supervised Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Henry Lam,et al.  The empirical likelihood approach to quantifying uncertainty in sample average approximation , 2017, Oper. Res. Lett..

[8]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[9]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[10]  Vishal Gupta,et al.  Data-driven robust optimization , 2013, Math. Program..

[11]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[12]  Yuanqing Li,et al.  A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system , 2008, Pattern Recognit. Lett..

[13]  Don McLeish,et al.  A general method for debiasing a Monte Carlo estimator , 2010, Monte Carlo Methods Appl..

[14]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[15]  C. Villani Optimal Transport: Old and New , 2008 .

[16]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[17]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[18]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[19]  Peter W. Glynn,et al.  Unbiased Monte Carlo for optimization and functions of expectations via multi-level randomization , 2015, 2015 Winter Simulation Conference (WSC).

[20]  Constantine Caramanis,et al.  Theory and Applications of Robust Optimization , 2010, SIAM Rev..

[21]  Michael B. Giles,et al.  Multilevel Monte Carlo Path Simulation , 2008, Oper. Res..

[22]  Yang Kang,et al.  Sample Out-of-Sample Inference Based on Wasserstein Distance , 2016, Oper. Res..

[23]  M. KarthyekRajhaaA.,et al.  Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..

[24]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[25]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..