A Literature Review of Domain Adaptation with Unlabeled Data

In supervised learning, it is typically assumed that the labeled training data comes from the same distribution as the test data to which the system will be applied. In recent years, machine-learning researchers have investigated methods to handle mismatch between the training and test domains, with the goal of building a classifier using the labeled data in the old domain that will perform well on the test data in the new domain. This problem is called domain adaptation or transfer learning, and it is a common scenario in speech processing applications. Labeled training data are often produced by an expensive hand-annotation process, and may consist of only one or two annotated corpora which are used to train virtually all systems regardless of the target domain. Often little or no labeled data is available for the new domain. In this work, we review the statistical machine learning literature dealing with the problem of “domain adaptation” or “transfer learning”. We focus on unsupervised domain adaptation methods, as opposed to model adaptation or supervised adaptation in which some labeled data is available from the test distribution. We consider four main classes of approaches in the literature: instance weighting for covariate shift; selflabeling methods; changes in feature representation; and cluster-based learning. Covariate shift methods re-weight training samples in the old domain to try to match the new domain, putting more weight on samples in populous regions in the new domain. Self-labeling methods incorporate unlabeled target domain examples into the training algorithm by making an initial guess about their labels and then iteratively refining the guesses or labeling more examples. Feature representation approaches try to find a new feature representation of the data, either to make the new and old distributions look similar, or to find an abstracted representation for domain-specific features. Cluster-based methods rely on the assumption that samples connected by high-density paths are likely to have the same label. Domain adaptation is a large area of research, with related work under several frameworks (and several names). A limited review from March 2008 can be found in [1], and one from Oct 2010 can be found in [2]. A recent book [3] investigates train/test distribution mismatch in machine learning (particularly focused on covariate shift.) Some of the organization here roughly follows that in [1].

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[3]  Walter Daelemans,et al.  Using Domain Similarity for Performance Estimation , 2010, ACL 2010.

[4]  ChengXiang Zhai,et al.  A two-stage approach to domain adaptation for statistical classifiers , 2007, CIKM '07.

[5]  Qiang Yang,et al.  Spectral domain-transfer learning , 2008, KDD.

[6]  Xian Wu,et al.  Domain Adaptation with Latent Semantic Association for Named Entity Recognition , 2009, NAACL.

[7]  Fuzhen Zhuang,et al.  Inductive transfer learning for unlabeled target-domain via hybrid regularization , 2009 .

[8]  Xiao Li,et al.  Inductive and example-based learning for text classification , 2008, INTERSPEECH.

[9]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[10]  Marco Maggini,et al.  An EM based training algorithm for cross-language text categorization , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[11]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[12]  Jingrui He,et al.  Graph-based transfer learning , 2009, CIKM.

[13]  Brian Roark,et al.  Supervised and unsupervised PCFG adaptation to novel domains , 2003, NAACL.

[14]  Christopher Joseph Pal,et al.  Cross Lingual Adaptation: An Experiment on Sentiment Classifications , 2010, ACL.

[15]  Hwee Tou Ng,et al.  Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation , 2006, ACL.

[16]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[17]  Benno Stein,et al.  Cross-Language Text Classification Using Structural Correspondence Learning , 2010, ACL.

[18]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[19]  Ramesh Nallapati,et al.  A Comparative Study of Methods for Transductive Transfer Learning , 2007 .

[20]  Jörg Tiedemann,et al.  Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache , 2010, ACL 2010.

[21]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[22]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[23]  Songbo Tan,et al.  Improving SCL Model for Sentiment-Transfer Learning , 2009, HLT-NAACL.

[24]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[25]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[26]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[27]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[28]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[29]  Massimiliano Ciaramita,et al.  Adaptive Parameters for Entity Recognition with Perceptron HMMs , 2010 .

[30]  Lei Shi,et al.  Cross Language Text Classification by Model Translation and Semi-Supervised Learning , 2010, EMNLP.

[31]  Giuseppe Carenini,et al.  Domain Adaptation to Summarize Human Conversations , 2010 .

[32]  Michael L. Littman,et al.  Automatic Cross-Language Retrieval Using Latent Semantic Indexing , 1997 .

[33]  Yong Yu,et al.  Bridged Refinement for Transfer Learning , 2007, PKDD.

[34]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[35]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[36]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[37]  Amos Storkey,et al.  When Training and Test Sets are Different: Characterising Learning Transfer , 2013 .

[38]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[39]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[40]  Neil D. Lawrence,et al.  Geometry of Covariate Shift with Applications to Active Learning , 2009 .

[41]  Tom M. Mitchell,et al.  Semi-Supervised Text Classification Using EM , 2006, Semi-Supervised Learning.

[42]  Bhuvana Ramabhadran,et al.  Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data , 2010, INTERSPEECH.

[43]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[44]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[45]  Jun Zhao,et al.  NLPR at Multilingual Opinion Analysis Task in NTCIR7 , 2008, NTCIR.

[46]  Alexander Yates,et al.  Open-Domain Semantic Role Labeling by Modeling Word Spans , 2010, ACL.

[47]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[48]  Hongbo Xu,et al.  Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis , 2009, ECIR.

[49]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[50]  Ivor W. Tsang,et al.  Extracting discriminative concepts for domain adaptation in text mining , 2009, KDD.

[51]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[52]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[53]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[54]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[55]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[56]  Ji Zhu,et al.  A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning , 2004, NIPS.

[57]  Philip S. Yu,et al.  An improved categorization of classifier's sensitivity on sample selection bias , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[58]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[59]  Seong-Bae Park,et al.  Coping with Distribution Change in the Same Domain Using Similarity-Based Instance Weighting , 2009, ACML.

[60]  Yiming Yang,et al.  Domain adaptation of translation models for multilingual applications , 2009 .

[61]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[62]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[63]  Songbo Tan,et al.  Using unlabeled data to handle domain-transfer problem of semantic detection , 2008, SAC '08.

[64]  Sunita Sarawagi,et al.  Domain Adaptation of Conditional Probability Models Via Feature Subsetting , 2007, PKDD.

[65]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[66]  John C. Platt,et al.  Translingual Document Representations from Discriminative Projections , 2010, EMNLP.

[67]  Changshui Zhang,et al.  Transferred Dimensionality Reduction , 2008, ECML/PKDD.

[68]  Aleksander Kolcz,et al.  Feature Weighting for Improved Classifier Robustness , 2009, CEAS 2009.

[69]  Bhuvana Ramabhadran,et al.  Unsupervised Model Adaptation using Information-Theoretic Criterion , 2010, HLT-NAACL.

[70]  Fei Huang,et al.  Exploring Representation-Learning Approaches to Domain Adaptation , 2010 .

[71]  Joost N. Kok,et al.  Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, PKDD.

[72]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[73]  Alexander Yates,et al.  Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling , 2009, ACL.

[74]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[75]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[76]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[77]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[78]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[79]  Eugene Charniak,et al.  Automatic Domain Adaptation for Parsing , 2010, NAACL.

[80]  Massih-Reza Amini,et al.  Semi Supervised Logistic Regression , 2002, ECAI.

[81]  Andreas Stolcke,et al.  Integrating MAP, marginals, and unsupervised language model adaptation , 2007, INTERSPEECH.

[82]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[83]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[84]  Sean Borman,et al.  The Expectation Maximization Algorithm A short tutorial , 2006 .

[85]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[86]  Mark Hasegawa-Johnson,et al.  Maximum mutual information estimation with unlabeled data for phonetic classification , 2008, INTERSPEECH.

[87]  Gang Niu,et al.  Transfer learning via multi-view principal component analysis , 2011 .

[88]  Qiang Yang,et al.  EigenTransfer: a unified framework for transfer learning , 2009, ICML '09.

[89]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[90]  Wen Wang Combining discriminative re-ranking and co-training for parsing Mandarin speech transcripts , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[91]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[92]  Qiang Yang,et al.  Can chinese web pages be classified with english data source? , 2008, WWW.

[93]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[94]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[95]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[96]  Gary Geunbae Lee,et al.  Semi-supervised Speech Act Recognition in Emails and Forums , 2009, EMNLP.

[97]  J. Heckman Sample selection bias as a specification error , 1979 .

[98]  Mari Ostendorf,et al.  Domain Adaptation with Unlabeled Data for Dialog Act Tagging , 2010 .

[99]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[100]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[101]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[102]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[103]  Le Song,et al.  Colored Maximum Variance Unfolding , 2007, NIPS.

[104]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[105]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[106]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[107]  Kenji Sagae Self-Training without Reranking for Parser Domain Adaptation and Its Impact on Semantic Role Labeling , 2010 .

[108]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[109]  Brian Roark,et al.  Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[110]  Miroslav Dudík,et al.  Correcting sample selection bias in maximum entropy density estimation , 2005, NIPS.

[111]  Masashi Sugiyama,et al.  Binary Classification under Sample Selection Bias , 2009 .

[112]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[113]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[114]  Yong Yu,et al.  Knowledge Transferring Via Implicit Link Analysis , 2008, DASFAA.

[115]  Rie Kubota Ando,et al.  Exploiting Unannotated Corpora for Tagging and Chunking , 2004, ACL.

[116]  Philip S. Yu,et al.  Type-Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing , 2008, SDM.

[117]  Masashi Sugiyama,et al.  Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation , 2009, J. Inf. Process..

[118]  Steffen Bickel,et al.  Dirichlet-Enhanced Spam Filtering based on Biased Samples , 2006, NIPS.

[119]  Manuel A. Sánchez-Montañés,et al.  A New Learning Strategy for Classification Problems with Different Training and Test Distributions , 2007, IWANN.

[120]  John Blitzer,et al.  Frustratingly Hard Domain Adaptation for Dependency Parsing , 2007, EMNLP.

[121]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.