Bounds on the minimax rate for estimating a prior over a VC class from independent learning tasks

Abstract We study the optimal rates of convergence for estimating a prior distribution over a VC class from a sequence of independent data sets respectively labeled by independent target functions sampled from the prior. We specifically derive upper and lower bounds on the optimal rates under a smoothness condition on the correct prior, with the number of samples per data set equal the VC dimension. These results have implications for the improvements achievable via transfer learning. We additionally extend this setting to real-valued function, where we establish consistency of an estimator for the prior, and discuss an additional application to a preference elicitation problem in algorithmic economics.

[1]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[2]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[3]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[4]  M. Schervish Theory of Statistics , 1995 .

[5]  P. Billingsley,et al.  Probability and Measure , 1980 .

[6]  R. Khan,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .

[7]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[8]  Jaime G. Carbonell,et al.  A theory of transfer learning with applications to active learning , 2013, Machine Learning.

[9]  Marcus Hutter,et al.  MDL convergence speed for Bernoulli sequences , 2006, Stat. Comput..

[10]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[11]  Avrim Blum,et al.  On polynomial-time preference elicitation with value queries , 2003, EC '03.

[12]  Michel Gendreau,et al.  Combinatorial auctions , 2007, Ann. Oper. Res..

[13]  Y. Yatracos Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , 1985 .

[14]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[15]  F. Y. Edgeworth,et al.  The theory of statistics , 1996 .

[16]  Jaime G. Carbonell,et al.  Identifiability of Priors from Bounded Sample Sizes with Applications to Transfer Learning , 2011, COLT.

[17]  Jaime G. Carbonell,et al.  Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks , 2015, ALT.

[18]  Ziv Bar-Yossef,et al.  Sampling lower bounds via information theory , 2003, STOC '03.