Domain Adaptation with Topical Correspondence Learning

A serious and ubiquitous issue in machine learning is the lack of sufficient training data in a domain of interest. Domain adaptation is an effective approach to dealing with this problem by transferring information or models learned from related, albeit distinct, domains to the target domain. We develop a novel domain adaptation method for text document classification under the framework of Non-negative Matrix Factorization. Two key ideas of our method are to construct a latent topic space where a topic is decomposed into common words shared by all domains and words specific to individual domains, and then to establish associations between words in different domains through the common words as a bridge for knowledge transfer. The correspondence between cross-domain topics leads to more coherent distributions of source and target domains in the new representation while preserving the predictive power. Our new method outperformed several state-of-the-art domain adaptation methods on several benchmark datasets.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[5]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[6]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[7]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[8]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[9]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[10]  Qian Liu,et al.  Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction , 2008, Bioinform..

[11]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[12]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[13]  Hui Xiong,et al.  Exploiting Associations between Word Clusters and Document Classes for Cross-Domain Text Categorization , 2010, SDM.

[14]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[15]  Chris H. Q. Ding,et al.  Bridging Domains with Words: Opinion Analysis with Matrix Tri-factorizations , 2010, SDM.

[16]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[17]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[18]  Qiang Yang,et al.  Transferring topical knowledge from auxiliary long texts for short text clustering , 2011, CIKM '11.

[19]  Svetha Venkatesh,et al.  Regularized nonnegative shared subspace learning , 2011, Data Mining and Knowledge Discovery.

[20]  Chang Wang,et al.  Heterogeneous Domain Adaptation Using Manifold Alignment , 2011, IJCAI.

[21]  Jianmin Wang,et al.  Dual Transfer Learning , 2012, SDM.

[22]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[23]  Jianmin Wang,et al.  Transfer Learning with Graph Co-Regularization , 2012, IEEE Transactions on Knowledge and Data Engineering.