Robust Conditional Probabilities

Conditional probabilities are a core concept in machine learning. For example, optimal prediction of a label $Y$ given an input $X$ corresponds to maximizing the conditional probability of $Y$ given $X$. A common approach to inference tasks is learning a model of conditional probabilities. However, these models are often based on strong assumptions (e.g., log-linear models), and hence their estimate of conditional probabilities is not robust and is highly dependent on the validity of their assumptions. Here we propose a framework for reasoning about conditional probabilities without assuming anything about the underlying distributions, except knowledge of their second order marginals, which can be estimated from data. We show how this setting leads to guaranteed bounds on conditional probabilities, which can be calculated efficiently in a variety of settings, including structured-prediction. Finally, we apply them to semi-supervised deep learning, obtaining results competitive with variational autoencoders.

[1]  James E. Smith,et al.  Generalized Chebychev Inequalities: Theory and Applications in Decision Analysis , 1995, Oper. Res..

[2]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[3]  Elad Eban,et al.  Discrete Chebyshev Classifiers , 2014, ICML.

[4]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[5]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[6]  Martin J. Wainwright,et al.  Tree consistency and bounds on the performance of the max-product algorithm and its generalizations , 2004, Stat. Comput..

[7]  Stephen P. Boyd,et al.  Generalized Chebyshev Bounds via Semidefinite Programming , 2007, SIAM Rev..

[8]  Jean B. Lasserre,et al.  Global Optimization with Polynomials and the Problem of Moments , 2000, SIAM J. Optim..

[9]  Gert de Cooman,et al.  The Hausdorff Moment Problem under Finite Additivity , 2007 .

[10]  Ravindra K. Ahuja,et al.  A capacity scaling algorithm for the constrained maximum flow problem , 1995, Networks.

[11]  Pablo A. Parrilo,et al.  Semidefinite programming relaxations for semialgebraic problems , 2003, Math. Program..

[12]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems - Exact Computational Methods for Bayesian Networks , 1999, Information Science and Statistics.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[15]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[16]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[17]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[18]  Le Song,et al.  A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .

[19]  Ioana Popescu,et al.  Optimal Inequalities in Probability Theory: A Convex Optimization Approach , 2005, SIAM J. Optim..

[20]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[21]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[22]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[23]  Marco Zaffalon,et al.  SOS for Bounded Rationality , 2017, ISIPTA.

[24]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[25]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[26]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[27]  Amir Globerson,et al.  An LP View of the M-best MAP problem , 2009, NIPS.

[28]  D. R. Fulkerson,et al.  Flows in Networks. , 1964 .

[29]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[30]  Sven Behnke,et al.  PyStruct: learning structured prediction in python , 2014, J. Mach. Learn. Res..

[31]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[32]  Slav Petrov,et al.  Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.

[33]  Michael I. Jordan Graphical Models , 2003 .

[34]  Koby Crammer,et al.  A Simple Geometric Interpretation of SVM using Stochastic Adversaries , 2012, AISTATS.