Multiple Source Adaptation and the Rényi Divergence

This paper presents a novel theoretical study of the general problem of multiple source adaptation using the notion of Renyi divergence. Our results build on our previous work [12], but significantly broaden the scope of that work in several directions. We extend previous multiple source loss guarantees based on distribution weighted combinations to arbitrary target distributions P, not necessarily mixtures of the source distributions, analyze both known and unknown target distribution cases, and prove a lower bound. We further extend our bounds to deal with the case where the learner receives an approximate distribution for each source instead of the exact one, and show that similar loss guarantees can be achieved depending on the divergence between the approximate and true distributions. We also analyze the case where the labeling functions of the source domains are somewhat different. Finally, we report the results of experiments with both an artificial data set and a sentiment analysis task, showing the performance benefits of the distribution weighted combinations and the quality of our bounds based on the Renyi divergence.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[2]  Robert L. Mercer,et al.  Adaptive Language Modeling Using Minimum Discriminant Estimation , 1992, HLT.

[3]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[4]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[5]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[6]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Christoph Arndt,et al.  Information Measures: Information and its Description in Science and Engineering , 2001 .

[9]  Aleix M. Martínez,et al.  Recognizing Imprecisely Localized, Partially Occluded, and Expression Variant Faces from a Single Sample per Class , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Brian Roark,et al.  Supervised and unsupervised PCFG adaptation to novel domains , 2003, NAACL.

[11]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[12]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[13]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[14]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[15]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[16]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[17]  John Blitzer,et al.  Frustratingly Hard Domain Adaptation for Dependency Parsing , 2007, EMNLP.

[18]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[19]  Heikki Kallasjoki Methods for Spectral Envelope Estimation in Noise Robust Speech Recognition , 2009 .