Unsupervised Risk Estimation with only Structural Assumptions

Given a model θ and unlabeled samples from a distribution p∗, we show how to estimate the labeled risk of θ while only making structural (i.e., conditional independence) assumptions about p∗. This lets us estimate a model’s test error on distributions very different than its training distribution, thus performing unsupervised domain adaptation even without assuming the true predictor remains constant (covariate shift). Furthermore, we can perform discriminative semi-supervised learning, even under model mis-specification. Our technical tool is the method of moments, which allows us to exploit conditional independencies without relying on a specific parametric model. Finally, we introduce a new theoretical framework for grappling with the non-identifiability of the class identities fundamental to unsupervised learning.

[1]  M. Sion On general minimax theorems , 1958 .

[2]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[3]  G. Lorentz Metric entropy and approximation , 1966 .

[4]  N. Tomizawa,et al.  On some techniques useful for solution of transportation network problems , 1971, Networks.

[5]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[6]  László Lovász,et al.  On the ratio of optimal integral and fractional covers , 1975, Discret. Math..

[7]  I. Ibragimov,et al.  Norms of Gaussian sample functions , 1976 .

[8]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[9]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[10]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[11]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[12]  James L. Powell,et al.  Estimation of semiparametric models , 1994 .

[13]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[14]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[15]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[16]  Fabio Gagliardi Cozman,et al.  Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers , 2006, Semi-Supervised Learning.

[17]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[18]  Tong Zhang,et al.  Two-view feature generation model for semi-supervised learning , 2007, ICML '07.

[19]  Dan Klein,et al.  Analyzing the Errors of Unsupervised Learning , 2008, ACL.

[20]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[21]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[22]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.

[23]  John Blitzer,et al.  Domain Adaptation with Coupled Subspaces , 2011, AISTATS.

[24]  Krishnakumar Balasubramanian,et al.  Unsupervised Supervised Learning II: Margin-Based Classification without Labels , 2011, AISTATS.

[25]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[26]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[27]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[28]  David Sontag,et al.  Unsupervised Learning of Noisy-Or Bayesian Networks , 2013, UAI.

[29]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[30]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[31]  Tom M. Mitchell,et al.  Estimating Accuracy from Unlabeled Data , 2014, UAI.

[32]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[33]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[34]  Zhi-Hua Zhou,et al.  Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Yuval Kluger,et al.  Estimating the accuracies of multiple classifiers without labeled data , 2014, AISTATS.

[36]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.