Information-theoretic analysis for transfer learning

Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different distributions (denoted as μ and μ', respectively). In this work, we give an informationtheoretic analysis on the generalization error and the excess risk of transfer learning algorithms, following a line of work initiated by Russo and Zhou. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence D(μ║μ') plays an important role in characterizing the generalization error in the settings of domain adaptation. Specifically, we provide generalization error upper bounds for general transfer learning algorithms, and extend the results to a specific empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the method to iterative, noisy gradient descent algorithms, and obtain upper bounds which can be easily calculated, only using parameters from the learning algorithms. A few illustrative examples are provided to demonstrate the usefulness of the results. In particular, our bound is tighter in specific classification problems than the bound derived using Rademacher complexity.

[1]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[2]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[3]  Philip S. Yu,et al.  Adaptation Regularization: A General Framework for Transfer Learning , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Gert R. G. Lanckriet,et al.  On the empirical estimation of integral probability metrics , 2012 .

[5]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[6]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[7]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[8]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[9]  Yanjun Han,et al.  Dependence measures bounding the exploration bias for general measurements , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[10]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[11]  Lei Zhang,et al.  Generalization Bounds for Domain Adaptation , 2012, NIPS.

[12]  Yuchen Zhang,et al.  Bridging Theory and Algorithm for Domain Adaptation , 2019, ICML.

[13]  Shaofeng Zou,et al.  Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[14]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[15]  Sergio Verdú,et al.  Chaining Mutual Information and Tightening Generalization Bounds , 2018, NeurIPS.