Agnostic Learning with Multiple Objectives

Most machine learning tasks are inherently multi-objective. This means that the learner has to come up with a model that performs well across a number of base objectives L1, . . . ,Lp, as opposed to a single one. Since optimizing with respect to multiple objectives at the same time is often computationally expensive, the base objectives are often combined in an ensemble ∑p k=1 λkLk, thereby reducing the problem to scalar optimization. The mixture weights λk are set to uniform or some other fixed distribution, based on the learner’s preferences. We argue that learning with a fixed distribution on the mixture weights runs the risk of overfitting to some individual objectives and significantly harming others, despite performing well on an entire ensemble. Moreover, in reality, the true preferences of a learner across multiple objectives are often unknown or hard to express as a specific distribution. Instead, we propose a new framework of Agnostic Learning with Multiple Objectives (ALMO), where a model is optimized for any weights in the mixture of base objectives. We present data-dependent Rademacher complexity guarantees for learning in the ALMO framework, which are used to guide a scalable optimization algorithm and the corresponding regularization. We present convergence guarantees for this algorithm, assuming convexity of the loss functions and the underlying hypothesis space. We further implement the algorithm in a popular symbolic gradient computation framework and empirically demonstrate on a number of datasets the benefits of ALMO framework versus learning with a fixed mixture weights distribution.

[1]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[5]  Jasbir S. Arora,et al.  Survey of multi-objective optimization methods for engineering , 2004 .

[6]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[7]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[8]  Peter R. Lewis,et al.  A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[9]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[10]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[11]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[12]  Mehryar Mohri,et al.  Algorithms and Theory for Multiple-Source Adaptation , 2018, NeurIPS.

[13]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[14]  Zoubin Ghahramani,et al.  Pareto Frontier Learning with Expensive Correlated Objectives , 2016, ICML.

[15]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[16]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[17]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[18]  Yaochu Jin,et al.  Multi-Objective Machine Learning , 2006, Studies in Computational Intelligence.

[19]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[20]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[21]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[22]  Musa A. Mammadov,et al.  From Convex to Nonconvex: A Loss Function Analysis for Binary Classification , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[23]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[24]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[25]  Kevin Duh,et al.  Learning to Translate with Multiple Objectives , 2012, ACL.

[26]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[27]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[28]  Bernhard Sendhoff,et al.  Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).