Identifiability of Priors from Bounded Sample Sizes with Applications to Transfer Learning

We explore a transfer learning setting, in which a finite sequence of target concepts are sampled independently with an unknown distribution from a known family. We study the total number of labeled examples required to learn all targets to an arbitrary specified expected accuracy, focusing on the asymptotics in the number of tasks and the desired accuracy. Our primary interest is formally understanding the fundamental benefits of transfer learning, compared to learning each target independently from the others. Our approach to the transfer problem is general, in the sense that it can be used with a variety of learning protocols. The key insight driving our approach is that the distribution of the target concepts is identifiable from the joint distribution over a number of random labeled data points equal the Vapnik-Chervonenkis dimension of the concept space. This is not necessarily the case for the joint distribution over any smaller number of points. This work has particularly interesting implications when applied to active learning methods.

[1]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[2]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[3]  Janet L. Kolodner,et al.  Case-Based Learning , 1993, Springer US.

[4]  Robert E. Mercer,et al.  Selective transfer of neural network task knowledge , 2000 .

[5]  Jaime G. Carbonell,et al.  Derivational analogy: a theory of reconstructive problem solving and expertise acquisition , 1993 .

[6]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[7]  Y. Yatracos Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , 1985 .

[8]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[9]  Jaime G. Carbonell,et al.  The Sample Complexity of Self-Verifying Bayesian Active Learning , 2011, AISTATS.

[10]  Jaime G. Carbonell,et al.  Derivational Analogy in PRODIGY: Automating Case Acquisition, Storage, and Utilization , 1993, Machine Learning.

[11]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[12]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[13]  M. Schervish Theory of Statistics , 1995 .

[14]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[15]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[16]  Robert Platt,et al.  Case-Based Learning , 2010, Encyclopedia of Machine Learning.

[17]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[18]  Steve Hanneke,et al.  Theoretical foundations of active learning , 2009 .

[19]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[20]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[21]  J. Carbonell Learning by Analogy: Formulating and Generalizing Plans from Past Experience , 1983 .

[22]  Maria-Florina Balcan,et al.  The true sample complexity of active learning , 2010, Machine Learning.