Transfer bounds for linear feature learning

If regression tasks are sampled from a distribution, then the expected error for a future task can be estimated by the average empirical errors on the data of a finite sample of tasks, uniformly over a class of regularizing or pre-processing transformations. The bound is dimension free, justifies optimization of the pre-processing feature-map and explains the circumstances under which learning-to-learn is preferable to single task learning.

[1]  T. Ben-David,et al.  Exploiting Task Relatedness for Multiple , 2003 .

[2]  Shimon Edelman,et al.  Representation, similarity, and the chorus of prototypes , 1993, Minds and Machines.

[3]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[4]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[5]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[6]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[7]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[8]  Nello Cristianini,et al.  On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[9]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[10]  Gilles Blanchard,et al.  Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[11]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[12]  Andreas Maurer,et al.  Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..

[13]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[14]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[15]  Gilles Blanchard,et al.  Statistical properties of kernel principal component analysis , 2007, Machine Learning.

[16]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[17]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[18]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[19]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[20]  C. McDiarmid Concentration , 1862, The Dental register.

[21]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[22]  Andreas Maurer,et al.  Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[23]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[24]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[25]  N. Cristianini,et al.  Estimating the moments of a random vector with applications , 2003 .

[26]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .