论文信息 - Transfer bounds for linear feature learning

Transfer bounds for linear feature learning

If regression tasks are sampled from a distribution, then the expected error for a future task can be estimated by the average empirical errors on the data of a finite sample of tasks, uniformly over a class of regularizing or pre-processing transformations. The bound is dimension free, justifies optimization of the pre-processing feature-map and explains the circumstances under which learning-to-learn is preferable to single task learning.

Andreas Maurer | Andreas Maurer

[1] T. Ben-David,et al. Exploiting Task Relatedness for Multiple , 2003 .

[2] Shimon Edelman,et al. Representation, similarity, and the chorus of prototypes , 1993, Minds and Machines.

[3] Sebastian Thrun,et al. Learning to Learn , 1998, Springer US.

[4] V. Koltchinskii,et al. Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[5] Charles A. Micchelli,et al. Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[6] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.

[7] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[8] Nello Cristianini,et al. On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[9] M. Talagrand,et al. Probability in Banach spaces , 1991 .

[10] Gilles Blanchard,et al. Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[11] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[12] Andreas Maurer,et al. Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..

[13] Sebastian Thrun,et al. Lifelong Learning Algorithms , 1998, Learning to Learn.

[14] Massimiliano Pontil,et al. Convex multi-task feature learning , 2008, Machine Learning.

[15] Gilles Blanchard,et al. Statistical properties of kernel principal component analysis , 2007, Machine Learning.

[16] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[17] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[18] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[19] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.

[20] C. McDiarmid. Concentration , 1862, The Dental register.

[21] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.

[22] Andreas Maurer,et al. Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[23] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[24] P. Gänssler. Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[25] N. Cristianini,et al. Estimating the moments of a random vector with applications , 2003 .

[26] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .