Instance-based Domain Adaptation via Multiclustering Logistic Approximation

With the explosive growth of the Internet online texts, we could nowadays easily collect a large amount of labeled training data from different source domains. However, a basic assumption in building statistical machine learning models for sentiment analysis is that the training and test data must be drawn from the same distribution. Directly training a statistical model usually results in poor performance, when the training and test data have different distributions. Faced with the massive labeled data from different domains, it is therefore important to identify the source-domain training instances that are closely relevant to the target domain, and make better use of them. In this work, we propose a new approach, called multiclustering logistic approximation (MLA), to address this problem. In MLA, we adapt the source-domain training data to the target domain via a framework of multiclustering logistic approximation. Experimental results demonstrate that MLA has significant advantages over the state-of-the-art instance adaptation methods, especially in the scenario of multidistributional training data.

[1]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jianfei Yu,et al.  Instance-Based Domain Adaptation in NLP via In-Target-Domain Logistic Approximation , 2014, AAAI.

[4]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[5]  Miroslav Dudík,et al.  Correcting sample selection bias in maximum entropy density estimation , 2005, NIPS.

[6]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[7]  Jian Yang,et al.  Instance Selection and Instance Weighting for Cross-Domain Sentiment Classification via PU Learning , 2013, IJCAI.

[8]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[9]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[10]  J. Heckman Sample selection bias as a specification error , 1979 .

[11]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[12]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[13]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[14]  Haixun Wang,et al.  Semantic Multidimensional Scaling for Open-Domain Sentiment Analysis , 2014, IEEE Intelligent Systems.

[15]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[16]  Masashi Sugiyama,et al.  Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation , 2008, SDM.

[17]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[18]  Rui Xia,et al.  Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification , 2013, IEEE Intelligent Systems.