Domain Adaptation of Conditional Probability Models Via Feature Subsetting

The goal in domain adaptation is to train a model using labeled data sampled from a domain different from the target domain on which the model will be deployed. We exploit unlabeled data from the target domain to train a model that maximizes likelihood over the training sample while minimizing the distance between the training and target distribution. Our focus is conditional probability models used for predicting a label structure y given input x based on features defined jointly over x and y . We propose practical measures of divergence between the two domains based on which we penalize features with large divergence, while improving the effectiveness of other less deviant correlated features. Empirical evaluation on several real-life information extraction tasks using Conditional Random Fields (CRFs) show that our method of domain adaptation leads to significant reduction in error.

[1]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[2]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[3]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[4]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[5]  ChengXiang Zhai,et al.  Exploiting Domain Structure for Named Entity Recognition , 2006, HLT-NAACL.

[6]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[7]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[8]  Andrew McCallum,et al.  Accurate Information Extraction from Research Papers using Conditional Random Fields , 2004, NAACL.

[9]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[10]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[11]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[12]  Dunja Mladenic,et al.  Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[13]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[14]  Xiao Li,et al.  A Bayesian Divergence Prior for Classiffier Adaptation , 2007, AISTATS.

[15]  Dale Schuurmans,et al.  Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling , 2006, ACL.

[16]  Alex Acero,et al.  Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..