Multi-domain learning by confidence-weighted parameter combination

State-of-the-art statistical NLP systems for a variety of tasks learn from labeled training data that is often domain specific. However, there may be multiple domains or sources of interest on which the system must perform. For example, a spam filtering system must give high quality predictions for many users, each of whom receives emails from different sources and may make slightly different decisions about what is or is not spam. Rather than learning separate models for each domain, we explore systems that learn across multiple domains. We develop a new multi-domain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multi-task learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of disparate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering.

[1]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[2]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[3]  Matthew Lease,et al.  Parsing Biomedical Literature , 2005, IJCNLP.

[4]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[5]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[6]  Andrew Y. Ng,et al.  Transfer learning for text classification , 2005, NIPS.

[7]  Alex Acero,et al.  Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[8]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Very Large Corpora Empirical Methods in Natural Language Processing , 1999 .

[10]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[12]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[13]  Hal Daumé,et al.  Bayesian Multitask Learning with Latent Hierarchies , 2009, UAI.

[14]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[15]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[16]  Thomas G. Dietterich,et al.  Two Algorithms for Transfer Learning , .

[17]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[18]  John Blitzer,et al.  Frustratingly Hard Domain Adaptation for Dependency Parsing , 2007, EMNLP.

[19]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[20]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[21]  Eugene Charniak,et al.  Self-Training for Biomedical Parsing , 2008, ACL.

[22]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[23]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[24]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[25]  Rajat Raina,et al.  Constructing informative priors using transfer learning , 2006, ICML.

[26]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[27]  Philip M. Long,et al.  Online Multitask Learning , 2006, COLT.

[28]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[29]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[30]  Ramesh Nallapati,et al.  Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition , 2008, ACL.

[31]  Steffen Bickel,et al.  Transfer Learning by Distribution Matching for Targeted Advertising , 2008, NIPS.

[32]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[33]  Peter L. Bartlett,et al.  Multitask Learning with Expert Advice , 2007, COLT.

[34]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[35]  Sebastian Thrun,et al.  Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge , 1998, Learning to Learn.

[36]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[37]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[38]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[39]  Hwee Tou Ng,et al.  Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation , 2006, ACL.

[40]  Gunnar Rätsch,et al.  An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis , 2008, NIPS.

[41]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[42]  ChengXiang Zhai,et al.  A two-stage approach to domain adaptation for statistical classifiers , 2007, CIKM '07.

[43]  Koby Crammer,et al.  Online Methods for Multi-Domain Learning and Adaptation , 2008, EMNLP.

[44]  Sunita Sarawagi,et al.  Domain Adaptation of Conditional Probability Models Via Feature Subsetting , 2007, PKDD.

[45]  Koby Crammer,et al.  Exact Convex Confidence-Weighted Learning , 2008, NIPS.

[46]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.