Probably Approximately Correct Constrained Learning

As learning solutions reach critical applications in social, industrial, and medical domains, the need to curtail their behavior becomes paramount. There is now ample evidence that without explicit tailoring, learning can lead to biased, unsafe, and prejudiced solutions. To tackle these problems, we develop a generalization theory of constrained learning based on the probably approximately correct (PAC) learning framework. In particular, we show that imposing requirements does not make a learning problem harder in the sense that any PAC learnable class is also PAC constrained learnable using a constrained counterpart of the empirical risk minimization (ERM) rule. For typical parametrized models, however, this learner involves solving a non-convex optimization program for which even obtaining a feasible solution may be hard. To overcome this issue, we prove that under mild conditions the empirical dual problem of constrained learning is also a PAC constrained learner that now leads to a practical constrained learning algorithm. We analyze the generalization properties of this solution and use it to illustrate how constrained learning can address problems in fair and robust classification.

[1]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings , 2014, Proc. Priv. Enhancing Technol..

[2]  Sathya N. Ravi,et al.  Explicitly Imposing Constraints in Deep Networks via Conditional Gradients Gives Improved Generalization and Faster Convergence , 2019, AAAI.

[3]  B. V. Dean,et al.  Studies in Linear and Non-Linear Programming. , 1959 .

[4]  R. Durrett Probability: Theory and Examples , 1993 .

[5]  Fei Yu,et al.  Maximum margin partial label learning , 2017, Machine Learning.

[6]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[7]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[8]  Sean A. Munson,et al.  Unequal Representation and Gender Stereotypes in Image Search Results for Occupations , 2015, CHI.

[9]  Maya R. Gupta,et al.  Satisfying Real-world Goals with Dataset Constraints , 2016, NIPS.

[10]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[11]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[12]  Seth Neel,et al.  A Convex Framework for Fair Regression , 2017, ArXiv.

[13]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[14]  Andrea Vedaldi,et al.  Warped Convolutions: Efficient Invariance to Spatial Transformations , 2016, ICML.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  B. V. Dean,et al.  Studies in Linear and Non-Linear Programming. , 1959 .

[17]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[18]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[19]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[20]  Guy Van den Broeck,et al.  A Semantic Loss Function for Deep Learning with Symbolic Knowledge , 2017, ICML.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Alejandro Ribeiro,et al.  Invariance-Preserving Localized Activation Functions for Graph Neural Networks , 2019, IEEE Transactions on Signal Processing.

[23]  Nikos Komodakis,et al.  Rotation Equivariant Vector Field Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[25]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[26]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[27]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[28]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[29]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[30]  Alejandro Ribeiro,et al.  Learning Safe Policies via Primal-Dual Methods , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[31]  Maya R. Gupta,et al.  Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals , 2018, J. Mach. Learn. Res..

[32]  Dimitri P. Bertsekas,et al.  Convex Optimization Algorithms , 2015 .

[33]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[34]  Dale Schuurmans,et al.  Learning with a Strong Adversary , 2015, ArXiv.

[35]  Amir Globerson,et al.  Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.

[36]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[37]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[38]  Uri Shaham,et al.  Understanding adversarial training: Increasing local stability of supervised models through robust optimization , 2015, Neurocomputing.

[39]  Ben Taskar,et al.  Learning from Partial Labels , 2011, J. Mach. Learn. Res..

[40]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[41]  Rich Caruana,et al.  Improving Classification with Pairwise Constraints: A Margin-Based Approach , 2008, ECML/PKDD.

[42]  A. Messac,et al.  The normalized normal constraint method for generating the Pareto frontier , 2003 .

[43]  Recursive Optimization of Convex Risk Measures: Mean-Semideviation Models , 2018, 1804.00636.

[44]  Alejandro Ribeiro,et al.  The Empirical Duality Gap of Constrained Statistical Learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  E. Polak,et al.  On Multicriteria Optimization , 1976 .

[46]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[47]  Razvan Pascanu,et al.  Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.

[48]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[49]  Dan Roth,et al.  Learning Coherent Concepts , 2001, ALT.

[50]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[51]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[52]  Yonina C. Eldar,et al.  Functional Nonlinear Sparse Models , 2020, IEEE Transactions on Signal Processing.

[53]  Dimitri P. Bertsekas,et al.  Convex Optimization Theory , 2009 .

[54]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[55]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[56]  S. Ermon,et al.  The Information-Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Modeling , 2018 .

[57]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[58]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[59]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[60]  A. Shapiro Semi-infinite programming, duality, discretization and optimality conditions , 2009 .

[61]  Hui Xiong,et al.  Risk-Averse Classification , 2018, Annals of Operations Research.

[62]  Jianshu Chen,et al.  A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property , 2013 .

[63]  Krishna P. Gummadi,et al.  Fairness Constraints: A Flexible Approach for Fair Classification , 2019, J. Mach. Learn. Res..

[64]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[65]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[66]  S Ruzikal,et al.  SUCCESSIVE APPROACH TO COMPUTE THE BOUNDED PARETO FRONT OF PRACTICAL MULTIOBJECTIVE OPTIMIZATION PROBLEMS , 2009 .

[67]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[68]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[69]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[70]  Maurice Weiler,et al.  Learning Steerable Filters for Rotation Equivariant CNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Adel Javanmard,et al.  Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.