The Sample Complexity of Meta Sparse Regression

This paper addresses the meta-learning problem in sparse linear regression with infinite tasks. We assume that the learner can access several similar tasks. The goal of the learner is to transfer knowledge from the prior tasks to a similar but novel task. For p parameters, size of the support set k , and l samples per task, we show that T \in O (( k log(p) ) /l ) tasks are sufficient in order to recover the common support of all tasks. With the recovered support, we can greatly reduce the sample complexity for estimating the parameter of the novel task, i.e., l \in O (1) with respect to T and p . We also prove that our rates are minimax optimal. A key difference between meta-learning and the classical multi-task learning, is that meta-learning focuses only on the recovery of the parameters of the novel task, while multi-task learning estimates the parameter of all tasks, which requires l to grow with T . Instead, our efficient meta-learning estimator allows for l to be constant with respect to T (i.e., few-shot learning).

[1]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[2]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[3]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[4]  M. Wainwright,et al.  Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ 1,∞ -regularization , 2008, NIPS 2008.

[5]  J WainwrightMartin Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso) , 2009 .

[6]  Jay W. Shin,et al.  Temporal dynamics and transcriptional control using single-cell gene expression analysis , 2013, Genome Biology.

[7]  Martin J. Wainwright,et al.  Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block $\ell _{1}/\ell _{\infty} $-Regularization , 2009, IEEE Transactions on Information Theory.

[8]  M. Wainwright,et al.  Simultaneous support recovery in high dimensions : Benefits and perils of block l 1 / l ∞-regularization , 2009 .

[9]  Quanming Yao,et al.  Few-shot Learning: A Survey , 2019, ArXiv.

[10]  V. Viallon,et al.  Regression modeling on stratified data with the lasso , 2015, 1508.05476.

[11]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[12]  Martin J. Wainwright,et al.  Information-theoretic bounds on model selection for Gaussian Markov random fields , 2010, 2010 IEEE International Symposium on Information Theory.

[13]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[14]  Volkan Cevher,et al.  An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation , 2019, Information-Theoretic Methods in Data Science.

[15]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[16]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[17]  H. Bondell,et al.  Joint Variable Selection for Fixed and Random Effects in Linear Mixed‐Effects Models , 2010, Biometrics.

[18]  Lionel M. Ni,et al.  Generalizing from a Few Examples , 2020, ACM Comput. Surv..

[19]  S. Geer,et al.  Oracle Inequalities and Optimal Inference under Group Sparsity , 2010, 1007.1771.

[20]  F. Götze,et al.  Concentration inequalities for polynomials in α-sub-exponential random variables , 2021 .

[21]  Anru Zhang,et al.  Sparse and Low-Rank Tensor Estimation via Cubic Sketchings , 2018, IEEE Transactions on Information Theory.

[22]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[23]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[24]  Aryan Mokhtari,et al.  On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms , 2019, AISTATS.

[25]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[26]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[27]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[28]  Andreas Maurer,et al.  Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[29]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[30]  Alexandros G. Dimakis,et al.  On the Information Theoretic Limits of Learning Ising Models , 2014, NIPS.

[31]  Holger Sambale,et al.  Concentration inequalities for polynomials in α-sub-exponential random variables , 2019, Electronic Journal of Probability.

[32]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[33]  Martin J. Wainwright,et al.  Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions , 2009, IEEE Transactions on Information Theory.

[34]  J. Vanschoren Meta-Learning , 2018, Automated Machine Learning.

[35]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[36]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .