Boosting with Multiple Sources

We study the problem of learning accurate ensemble predictors, in particular boosting, in the presence of multiple source domains. We show that the standard convex combination ensembles in general cannot succeed in this scenario and adopt instead a domain-weighted combination. We introduce and analyze a new boosting algorithm, MULTIBOOST, for this scenario and show that it benefits from favorable theoretical guarantees. We also report the results of several experiments with our algorithm demonstrating that it outperforms natural baselines on multi-source text-based, image-based and tabular data. We further present an extension of our algorithm to the federated learning scenario and report favorable experimental results for that setting as well. Additionally, we describe in detail an extension of our algorithm to the multi-class setting, MCMULTIBOOST, for which we also report experimental results.

[1]  Satyen Kale,et al.  Federated Functional Gradient Boosting , 2021, AISTATS.

[2]  M. Mohri,et al.  Communication-Efficient Agnostic Federated Averaging , 2021, Interspeech.

[3]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[4]  Mehryar Mohri,et al.  A Discriminative Technique for Multiple-Source Adaptation , 2021, ICML.

[5]  Heiko Ludwig,et al.  Mitigating Bias in Federated Learning , 2020, ArXiv.

[6]  Judy Hoffman,et al.  Multiple-source adaptation theory and algorithms , 2020, Annals of Mathematics and Artificial Intelligence.

[7]  Yair Carmon,et al.  Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[8]  Ananda Theertha Suresh,et al.  FedBoost: A Communication-Efficient Algorithm for Federated Learning , 2020, ICML.

[9]  Ed H. Chi,et al.  Fairness without Demographics through Adversarially Reweighted Learning , 2020, NeurIPS.

[10]  Aboozar Taherkhani,et al.  AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning , 2020, Neurocomputing.

[11]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[12]  Trevor Darrell,et al.  Semi-Supervised Domain Adaptation via Minimax Entropy , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Jiashi Feng,et al.  Few-Shot Adaptive Faster R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[15]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[16]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[17]  Christoph H. Lampert,et al.  Robust Learning from Untrusted Sources , 2019, ICML.

[18]  Eric Eaton,et al.  Transfer Learning via Minimizing the Performance Gap Between Domains , 2019, NeurIPS.

[19]  Hubert Eichner,et al.  APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS , 2018, ArXiv.

[20]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[21]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[22]  Jianmin Wang,et al.  Multi-Adversarial Domain Adaptation , 2018, AAAI.

[23]  Mehryar Mohri,et al.  Algorithms and Theory for Multiple-Source Adaptation , 2018, NeurIPS.

[24]  Wei Shi,et al.  Federated learning of predictive models from federated Electronic Health Records , 2018, Int. J. Medical Informatics.

[25]  Liang Lin,et al.  Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  José M. F. Moura,et al.  Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.

[27]  Quinn Jones,et al.  Few-Shot Adversarial Domain Adaptation , 2017, NIPS.

[28]  Donald A. Adjeroh,et al.  Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[30]  Ming Liu,et al.  Integrated Transfer Learning Algorithm Using Multi-source TrAdaBoost for Unbalanced Samples Classification , 2017, 2017 International Conference on Computing Intelligence and Information System (CIIS).

[31]  Mehryar Mohri,et al.  AdaNet: Adaptive Structural Learning of Artificial Neural Networks , 2016, ICML.

[32]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[33]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[34]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[35]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[36]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[37]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[39]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[40]  Mehryar Mohri,et al.  Adaptation Algorithm and Theory Based on Generalized Discrepancy , 2014, KDD.

[41]  Mehryar Mohri,et al.  Multi-Class Deep Boosting , 2014, NIPS.

[42]  Dong Xu,et al.  Exploiting Low-Rank Structure from Latent Domains for Domain Generalization , 2014, ECCV.

[43]  Ming Li,et al.  Instance Transfer Learning with Multisource Dynamic TrAdaBoost , 2014, TheScientificWorldJournal.

[44]  Mehryar Mohri,et al.  Deep Boosting , 2014, ICML.

[45]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Mehryar Mohri,et al.  Domain adaptation and sample bias correction theory and algorithm for regression , 2014, Theor. Comput. Sci..

[47]  Pascal Fua,et al.  Non-Linear Domain Adaptation with Boosting , 2013, NIPS.

[48]  Marc Sebban,et al.  Boosting for Unsupervised Domain Adaptation , 2013, ECML/PKDD.

[49]  Hank Liao,et al.  Speaker adaptation of context dependent deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[50]  Trevor Darrell,et al.  Efficient Learning of Domain-invariant Image Representations , 2013, ICLR.

[51]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[52]  Wang Xuesong,et al.  Weighted Multi-source TrAdaBoost ∗ , 2013 .

[53]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[54]  Shiliang Sun,et al.  Multi-source Transfer Learning with Multi-view Adaboost , 2012, ICONIP.

[55]  Trevor Darrell,et al.  Discovering Latent Domains for Multisource Domain Adaptation , 2012, ECCV.

[56]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Gang Wang,et al.  Boosting for transfer learning from multiple data sources , 2012, Pattern Recognit. Lett..

[58]  Shiliang Sun,et al.  Multi-view Transfer Learning with Adaboost , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[59]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[60]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[61]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[62]  Gang Wang,et al.  A novel learning approach to multiple tasks based on boosting methodology , 2010, Pattern Recognit. Lett..

[63]  Yi Yao,et al.  Boosting for transfer learning with multiple sources , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[64]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[65]  Yishay Mansour,et al.  Multiple Source Adaptation and the Rényi Divergence , 2009, UAI.

[66]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[67]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[68]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[69]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[70]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[71]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[72]  Thomas Hofmann,et al.  Analysis of Representations for Domain Adaptation , 2007 .

[73]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[74]  Y. Mansour,et al.  Generalization bounds for averaged classifiers , 2004, math/0410092.

[75]  Padhraic Smyth,et al.  Linearly Combining Density Estimators via Stacking , 1999, Machine Learning.

[76]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[77]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[78]  Gunnar Rätsch,et al.  On the Convergence of Leveraging , 2001, NIPS.

[79]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[80]  David P. Helmbold,et al.  Potential Boosters? , 1999, NIPS.

[81]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[82]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[83]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[84]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[85]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[86]  Alexander H. Waibel,et al.  The Meta-Pi Network: Building Distributed Knowledge Representations for Robust Multisource Pattern Recognition , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[87]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[88]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[89]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[90]  P. Bickel,et al.  Sex Bias in Graduate Admissions: Data from Berkeley , 1975, Science.