Domain Adaptation: A Small Sample Statistical Approach

We study the prevalent problem when a test distribution differs from the training distribution. We consider a setting where our training set consists of a small number of sample domains, but where we have many samples in each domain. Our goal is to generalize to a new domain. For example, we may want to learn a similarity function using only certain classes of objects, but we desire that this similarity function be applicable to object classes not present in our training sample (e.g. we might seek to learn that “dogs are similar to dogs” even though images of dogs were absent from our training set). Our theoretical analysis shows that we can select many more features than domains while avoiding overfitting by utilizing data-dependent variance properties. We present a greedy feature selection algorithm based on using T -statistics. Our experiments validate this theory showing that our T statistic based greedy feature selection is more robust at avoiding overfitting than the classical greedy procedure.

[1]  A E Bostwick,et al.  THE THEORY OF PROBABILITIES. , 1896, Science.

[2]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[5]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[6]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[8]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[10]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[11]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[12]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[13]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[14]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[15]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[16]  Larry P. Heck,et al.  Trada: tree based ranking function adaptation , 2008, CIKM '08.

[17]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[18]  Qian Liu,et al.  Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction , 2008, Bioinform..

[19]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[20]  Xian Wu,et al.  Domain Adaptation with Latent Semantic Association for Named Entity Recognition , 2009, HLT-NAACL.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Yi Su,et al.  Model Adaptation via Model Interpolation and Boosting for Web Search Ranking , 2009, EMNLP.

[23]  Alexander Yates,et al.  Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling , 2009, ACL.

[24]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations , 2011, IEEE Transactions on Information Theory.