A PAC-Bayesian bound for Lifelong Learning

Transfer learning has received a lot of attention in the machine learning community over the last years, and several effective algorithms have been developed. However, relatively little is known about their theoretical properties, especially in the setting of lifelong learning, where the goal is to transfer information to tasks for which no data have been observed so far. In this work we study lifelong learning from a theoretical perspective. Our main result is a PAC-Bayesian generalization bound that offers a unified view on existing paradigms for transfer learning, such as the transfer of parameters or the transfer of low-dimensional representations. We also use the bound to derive two principled lifelong learning algorithms, and we show that these yield results comparable with existing methods.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  D. Flinn Orientation Statistics , 1967, Nature.

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[4]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[5]  Stephen Cox,et al.  RecNorm: Simultaneous Normalisation and Classification Applied to Speech Recognition , 1990, NIPS.

[6]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[7]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[10]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[11]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[12]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[13]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[14]  John Shawe-Taylor,et al.  PAC Bayes and Margins , 2003 .

[15]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[16]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[17]  François Laviolette,et al.  PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers , 2007, J. Mach. Learn. Res..

[18]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[19]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[20]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[21]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[22]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[23]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[24]  Andreas Maurer,et al.  Transfer bounds for linear feature learning , 2009, Machine Learning.

[25]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[26]  Shiliang Sun,et al.  PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[27]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[28]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[29]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[30]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[31]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.