Distributionally Robust Semi-supervised Learning

We propose a novel method for semi-supervised learning based on data-driven distributionally robust optimization (DRO) using optimal transport metrics. Our proposed method enhances generalization error by using the non-labeled data to restrict the support of the worst case distribution in our DRO formulation. We enable the implementation of the DRO formulation by proposing a stochastic gradient descent algorithm which allows to easily implement the training procedure. We demonstrate the improvement in generalization error in semi-supervised extensions of regularized logistic regression and square-root LASSO. Finally, we include a discussion on the large sample behavior of the optimal uncertainty region in the DRO formulation. Our discussion exposes important aspects such as the role of dimension reduction in semi-supervised learning.

[1]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[2]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[3]  Peter W. Glynn,et al.  Unbiased Estimation with Square Root Convergence for SDE Models , 2015, Oper. Res..

[4]  Karthyek R. A. Murthy,et al.  Quantifying Distributional Model Risk Via Optimal Transport , 2016, Math. Oper. Res..

[5]  M. KarthyekRajhaaA.,et al.  Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Don McLeish,et al.  A general method for debiasing a Monte Carlo estimator , 2010, Monte Carlo Methods Appl..

[8]  Vishal Gupta,et al.  Data-driven robust optimization , 2013, Math. Program..

[9]  Yang Kang,et al.  Sample Out-of-Sample Inference Based on Wasserstein Distance , 2016, Oper. Res..

[10]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[11]  Constantine Caramanis,et al.  Theory and Applications of Robust Optimization , 2010, SIAM Rev..

[12]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[13]  Peter W. Glynn,et al.  Unbiased Monte Carlo for optimization and functions of expectations via multi-level randomization , 2015, 2015 Winter Simulation Conference (WSC).

[14]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[15]  Yuanqing Li,et al.  A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system , 2008, Pattern Recognit. Lett..

[16]  Michael B. Giles,et al.  Multilevel Monte Carlo Path Simulation , 2008, Oper. Res..

[17]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[18]  Michael B. Giles,et al.  Multilevel Monte Carlo methods , 2013, Acta Numerica.

[19]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .