Cross-Domain Labeled LDA for Cross-Domain Text Classification

Cross-domain text classification aims at building a classifier for a target domain which leverages data from both source and target domain. One promising idea is to minimize the feature distribution differences of the two domains. Most existing studies explicitly minimize such differences by an exact alignment mechanism (aligning features by one-to-one feature alignment, projection matrix etc.). Such exact alignment, however, will restrict models' learning ability and will further impair models' performance on classification tasks when the semantic distributions of different domains are very different. To address this problem, we propose a novel group alignment which aligns the semantics at group level. In addition, to help the model learn better semantic groups and semantics within these groups, we also propose a partial supervision for model's learning in source domain. To this end, we embed the group alignment and a partial supervision into a cross-domain topic model, and propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and Reuters dataset, extensive quantitative (classification, perplexity etc.) and qualitative (topic detection) experiments are conducted to show the effectiveness of the proposed group alignment and partial supervision.

[1]  Xiangji Huang,et al.  Bi-Transferring Deep Neural Networks for Domain Adaptation , 2016, ACL.

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Nigel Collier,et al.  A partially supervised cross-collection topic model for cross-domain text classification , 2013, CIKM.

[4]  Jian Shen,et al.  Wasserstein Distance Guided Representation Learning for Domain Adaptation , 2017, AAAI.

[5]  Fuzhen Zhuang,et al.  Supervised Representation Learning: Transfer Learning with Deep Autoencoders , 2015, IJCAI.

[6]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[7]  Mingsheng Long,et al.  Topic Correlation Analysis for Cross-Domain Text Classification , 2012, AAAI.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[10]  Svetha Venkatesh,et al.  Nonnegative shared subspace learning and its application to social media retrieval , 2010, KDD.

[11]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[12]  Pengfei Wei,et al.  Deep Nonlinear Feature Coding for Unsupervised Domain Adaptation , 2016, IJCAI.

[13]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[14]  Michael J. Paul,et al.  Cross-Cultural Analysis of Blogs and Forums with Mixed-Collection Topic Models , 2009, EMNLP.

[15]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[16]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[17]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[18]  Ludger Riischendorf The Wasserstein distance and approximation theorems , 1985 .

[19]  Benno Stein,et al.  Cross-Language Text Classification Using Structural Correspondence Learning , 2010, ACL.

[20]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[21]  Jian Shen,et al.  Adversarial Representation Learning for Domain Adaptation , 2017, ArXiv.

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[25]  Korris Fu-Lai Chung,et al.  The l2, 1-Norm Stacked Robust Autoencoders for Domain Adaptation , 2016, AAAI.

[26]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[27]  John Blitzer,et al.  Co-Training for Domain Adaptation , 2011, NIPS.

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[31]  Fei Wang,et al.  Complementary Aspect-Based Opinion Mining Across Asymmetric Collections , 2015, 2015 IEEE International Conference on Data Mining.

[32]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[33]  Fanjiang Xu,et al.  Cross-Domain Metric Learning Based on Information Theory , 2014, AAAI.

[34]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[35]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[36]  Yang Gao,et al.  Fredholm Multiple Kernel Learning for Semi-Supervised Domain Adaptation , 2017, AAAI.

[37]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[38]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[39]  Hui Xiong,et al.  Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification , 2010, CIKM.