论文信息 - A Literature Survey on Domain Adaptation of Statistical Classifiers

A Literature Survey on Domain Adaptation of Statistical Classifiers

The domain adaptation problem, especially domain adaptation in natural language processing, started gaining much attention very recently [Daumé III and Marcu, 2006, Blitzer et al., 2006, Ben-David et al., 2007, Daumé III, 2007, Satpal and Sarawagi, 2007]. However, some special kinds of domain adaptation problems have been studied before under different names such as class imbalance [Japkowicz and Stephen, 2002], covariate shift [Shimodaira, 2000], and sample selection bias [Heckman, 1979]. There are also some well-studied machine learning problems that are closely related but not equivalent to domain adaptation, including multi-task learning [Caruana, 1997] and semi-supervised learning [Chapelle et al., 2006]. In this literature survey, we review existing work in both the machine learning and the natural language processing communities related to domain adaptation. Because this relatively new topic is constantly attracting attention, our survey is necessarily incomplete. Nevertheless, we try to cover the major lines of work that we are aware of up to the date this survey is written. This survey will also be constantly updated. The goal of this literature survey is twofold. First, existing studies on domain adaptation seem very different from each other, and different terms are used to refer to the problem. There has not been any survey that connects these different studies. This survey thus tries to organize the existing work in a systematic way and draw a big picture of the domain adaptation problem with its possible solutions. Second, a systematic literature survey shows the limitations of current work and points out promising directions that should be explored.

James J. Jiang

[1] J. Heckman. Sample selection bias as a specification error , 1979 .

[2] Stan Matwin,et al. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[3] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.

[4] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[5] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6] JapkowiczNathalie,et al. The class imbalance problem: A systematic study , 2002 .

[7] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8] Nathalie Japkowicz,et al. The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[9] T. Ben-David,et al. Exploiting Task Relatedness for Multiple , 2003 .

[10] Charles A. Micchelli,et al. Kernels for Multi--task Learning , 2004, NIPS.

[11] Bianca Zadrozny,et al. Learning and evaluating classifiers under sample selection bias , 2004, ICML.