On Optimality of Meta-Learning in Fixed-Design Regression with Weighted Biased Regularization

We consider a fixed-design linear regression in the meta-learning model of Baxter (2000) and establish a problem-dependent finite-sample lower bound on the transfer risk (risk on a newly observed task) valid for all estimators. Moreover, we prove that a weighted form of a biased regularization - a popular technique in transfer and meta-learning - is optimal, i.e. it enjoys a problem-dependent upper bound on the risk matching our lower bound up to a constant. Thus, our bounds characterize meta-learning linear regression problems and reveal a fine-grained dependency on the task structure. Our characterization suggests that in the non-asymptotic regime, for a sufficiently large number of tasks, meta-learning can be considerably superior to a single-task learning. Finally, we propose a practical adaptation of the optimal estimator through Expectation-Maximization procedure and show its effectiveness in series of experiments.

[1]  Ilja Kuzborskij,et al.  Fast rates by transferring from auxiliary hypotheses , 2014, Machine Learning.

[2]  Christoph H. Lampert,et al.  Lifelong Learning with Non-i.i.d. Tasks , 2015, NIPS.

[3]  Mikhail Khodak,et al.  A Sample Complexity Separation between Non-Convex and Convex Meta-Learning , 2020, ICML.

[4]  Xiao Li,et al.  A Bayesian Divergence Prior for Classiffier Adaptation , 2007, AISTATS.

[5]  Jonathan Baxter,et al.  Theoretical Models of Learning to Learn , 1998, Learning to Learn.

[6]  Joshua B. Tenenbaum,et al.  One-Shot Learning with a Hierarchical Nonparametric Bayesian Model , 2011, ICML Unsupervised and Transfer Learning.

[7]  Sham M. Kakade,et al.  Few-Shot Learning via Learning the Representation, Provably , 2020, ICLR.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Maria-Florina Balcan,et al.  Provable Guarantees for Gradient-Based Meta-Learning , 2019, ICML.

[10]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[11]  Pan Zhou,et al.  How Important is the Train-Validation Split in Meta-Learning? , 2020, ICML.

[12]  Shai Ben-David Domain Adaptation as Learning with Auxiliary Information , 2013 .

[13]  Andreas Maurer,et al.  Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[14]  Aryan Mokhtari,et al.  On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms , 2019, AISTATS.

[15]  L. Schmetterer Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete. , 1963 .

[16]  Barbara Caputo,et al.  Safety in numbers: Learning categories from few examples with multi model knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[18]  Massimiliano Pontil,et al.  Learning To Learn Around A Common Mean , 2018, NeurIPS.

[19]  Kumar Chellapilla,et al.  Personalized handwriting recognition via biased regularization , 2006, ICML.

[20]  Pierre Alquier,et al.  Regret Bounds for Lifelong Learning , 2016, AISTATS.

[21]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[22]  Christoph H. Lampert,et al.  A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[23]  Massimiliano Pontil,et al.  Learning-to-Learn Stochastic Gradient Descent with Biased Regularization , 2019, ICML.

[24]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[25]  J. Bretagnolle,et al.  Estimation des densités: risque minimax , 1978 .

[26]  Massimiliano Pontil,et al.  The Benefit of Multitask Representation Learning , 2015, J. Mach. Learn. Res..

[27]  Massimiliano Pontil,et al.  Online-Within-Online Meta-Learning , 2019, NeurIPS.

[28]  Ilja Kuzborskij,et al.  Stability and Hypothesis Transfer Learning , 2013, ICML.

[29]  Toniann Pitassi,et al.  Theoretical bounds on estimation error for meta-learning , 2020, ArXiv.

[30]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[31]  Ron Meir,et al.  Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[32]  Ruth Urner,et al.  Lifelong Learning with Weighted Majority Votes , 2016, NIPS.

[33]  Barnabás Póczos,et al.  Hypothesis Transfer Learning via Transformation Functions , 2016, NIPS.

[34]  Steve Hanneke,et al.  A No-Free-Lunch Theorem for MultiTask Learning , 2020, The Annals of Statistics.

[35]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.