Meta-Learning PAC-Bayes Priors in Model Averaging

Nowadays model uncertainty has become one of the most important problems in both academia and industry. In this paper, we mainly consider the scenario in which we have a common model set used for model averaging instead of selecting a single final model via a model selection procedure to account for this model's uncertainty in order to improve reliability and accuracy of inferences. Here one main challenge is to learn the prior over the model set. To tackle this problem, we propose two data-based algorithms to get proper priors for model averaging. One is for meta-learner, the analysts should use historical similar tasks to extract the information about the prior. The other one is for base-learner, a subsampling method is used to deal with the data step by step. Theoretically, an upper bound of risk for our algorithm is presented to guarantee the performance of the worst situation. In practice, both methods perform well in simulations and real data studies, especially with poor quality data.

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Naftali Tishby,et al.  PAC-Bayesian Analysis of Co-clustering and Beyond , 2010, J. Mach. Learn. Res..

[3]  Pierre Alquier,et al.  PAC-Bayesian estimation and prediction in sparse additive models , 2012, Electronic Journal of Statistics.

[4]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[5]  Yuhong Yang Adaptive Regression by Mixing , 2001 .

[6]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[7]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[8]  Peter Grünwald,et al.  Fast Rates for General Unbounded Loss Functions: From ERM to Generalized Bayes , 2016, J. Mach. Learn. Res..

[9]  B. Hansen Least Squares Model Averaging , 2007 .

[10]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[11]  Ron Meir,et al.  Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[12]  Andrew R. Barron,et al.  Information Theory and Mixing Least-Squares Regressions , 2006, IEEE Transactions on Information Theory.

[13]  G. Lugosi,et al.  Regularization, sparse recovery, and median-of-means tournaments , 2017, Bernoulli.

[14]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .

[15]  Peter Grünwald,et al.  Fast Rates with Unbounded Losses , 2016, ArXiv.

[16]  Shiliang Sun,et al.  PAC-Bayes bounds for stable algorithms with instance-dependent priors , 2018, NeurIPS.

[17]  Yi Yang,et al.  Sparsity Oriented Importance Learning for High-Dimensional Linear Regression , 2016, Journal of the American Statistical Association.

[18]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[19]  Haiying Wang,et al.  Frequentist model averaging estimation: a review , 2009, J. Syst. Sci. Complex..

[20]  O. Catoni PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design , 2016, 1603.05229.

[21]  Anru Zhang,et al.  Sequential rerandomization , 2017, Biometrika.

[22]  Pierre Alquier,et al.  Simpler PAC-Bayesian bounds for hostile data , 2016, Machine Learning.

[23]  Davide Anguita,et al.  PAC-bayesian analysis of distribution dependent priors: Tighter risk bounds and stability analysis , 2016, Pattern Recognit. Lett..

[24]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[25]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[26]  Yuhong Yang Combining Different Procedures for Adaptive Regression , 2000, Journal of Multivariate Analysis.

[27]  J. Picard,et al.  Statistical learning theory and stochastic optimization : École d'eté de probabilités de Saint-Flour XXXI - 2001 , 2004 .

[28]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[29]  R. D. Tuddenham,et al.  Physical growth of California boys and girls from birth to eighteen years. , 1954, Publications in child development. University of California, Berkeley.

[30]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[31]  N. Hjort,et al.  Frequentist Model Average Estimators , 2003 .

[32]  Enrique Moral-Benito,et al.  Model Averaging in Economics: An Overview , 2015 .