Supervised Adaptive-Transfer PLSA for Cross-Domain Text Classification

Cross-domain learning is a very promising technique to improve classification in the target (testing) domain whose data distributions are very different from the source (training) domain. Many cross-domain text classification methods are built on topic modeling approaches. However, topic model methods are unsupervised in nature without fully utilizing the label information of the source domain. In addition, almost all cross-domain learning approaches utilize the knowledge of source domain in the later stage of the training process, and this limits the knowledge transfer. In this paper, we propose a model named Supervised Adaptive transfer Probabilistic Latent Semantic Analysis (SAtPLSA) for cross-domain text classification aiming to deal with the above two issues. The proposed model extends the original PLSA to a supervised learning paradigm. By defining the common labeled information from each term across domains, we transfer knowledge in source domain to assist classifying text in target domain. In addition, we adaptively modify the weight value controlling the proportion of the usage of knowledge from source domain in the model learning process. At last, we conducted experiments on nine benchmark datasets in cross domain text classification to compare the performance of our proposed algorithm with two classical supervised learning methods and five state-of-art transfer learning approaches. The experimental results have shown the effectiveness and efficiency of our proposed SAtPLSA algorithm.

[1]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[2]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[3]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[4]  Seungjin Choi,et al.  Probabilistic matrix tri-factorization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[6]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Nigel Collier,et al.  A partially supervised cross-collection topic model for cross-domain text classification , 2013, CIKM.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Yong Shi,et al.  Semi-supervised PLSA for Document Clustering , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[11]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[12]  Mingsheng Long,et al.  Topic Correlation Analysis for Cross-Domain Text Classification , 2012, AAAI.

[13]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .

[14]  Hui Xiong,et al.  Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification , 2010, CIKM.