Cross-Domain Knowledge Transfer Using Semi-supervised Classification

Traditional text classification algorithms are based on a basic assumption: the training and test data should hold the same distribution. However, this identical distribution assumption is always violated in real applications. Due to the distribution of test data from target domain and the distribution of training data from auxiliary domain are different, we call this classification problem cross-domain classification. Although most of the training data are drawn from auxiliary domain, we still can obtain a few training data drawn from target domain. To solve the cross-domain classification problem in this situation, we propose a two-stage algorithm which is based on semi-supervised classification. We firstly utilizes labeled data in target domain to filter the support vectors of the auxiliary domain, then uses filtered data and labeled data from target domain to construct a classifier for the target domain. The experimental evaluation on real-world text classification problems demonstrates encouraging results and validates our approach.

[1]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[2]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[3]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[4]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[5]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[6]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[7]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[8]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[9]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[10]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[11]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[12]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[13]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[14]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[15]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[17]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[18]  Sebastian Thrun,et al.  Learning One More Thing , 1994, IJCAI.

[19]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[20]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[21]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.