Transfer Learning for Cross-Language Text Categorization through Active Correspondences Construction

Most existing heterogeneous transfer learning (HTL) methods for cross-language text classification rely on sufficient cross-domain instance correspondences to learn a mapping across heterogeneous feature spaces, and assume that such correspondences are given in advance. However, in practice, correspondences between domains are usually unknown. In this case, extensively manual efforts are required to establish accurate correspondences across multilingual documents based on their content and meta-information. In this paper, we present a general framework to integrate active learning to construct correspondences between heterogeneous domains for HTL, namely HTL through active correspondences construction (HTLA). Based on this framework, we develop a new HTL method. On top of the new HTL method, we further propose a strategy to actively construct correspondences between domains. Extensive experiments are conducted on various multilingual text classification tasks to verify the effectiveness of HTLA.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[3]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[4]  Chang Wang,et al.  Heterogeneous Domain Adaptation Using Manifold Alignment , 2011, IJCAI.

[5]  Min Xiao,et al.  A Novel Two-Step Method for Cross Language Representation Learning , 2013, NIPS.

[6]  John C. Platt,et al.  Translingual Document Representations from Discriminative Projections , 2010, EMNLP.

[7]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[8]  Kristen Grauman,et al.  Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.

[9]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Wei Fan,et al.  Actively Transfer Domain Knowledge , 2008, ECML/PKDD.

[11]  Daumé,et al.  Domain Adaptation meets Active Learning , 2010, HLT-NAACL 2010.

[12]  Vikas Sindhwani,et al.  Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization , 2012, ICML.

[13]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[14]  Ivor W. Tsang,et al.  Learning with Augmented Features for Heterogeneous Domain Adaptation , 2012, ICML.

[15]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[16]  Ivor W. Tsang,et al.  Hybrid Heterogeneous Transfer Learning through Deep Learning , 2014, AAAI.

[17]  Qiang Yang,et al.  Active Transfer Learning for Cross-System Recommendation , 2013, AAAI.

[18]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[19]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Ivor W. Tsang,et al.  Heterogeneous Domain Adaptation for Multiple Classes , 2014, AISTATS.

[22]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[23]  Benno Stein,et al.  Cross-Language Text Classification Using Structural Correspondence Learning , 2010, ACL.

[24]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[25]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[26]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[27]  Jeff G. Schneider,et al.  Active Transfer Learning under Model Shift , 2014, ICML.

[28]  Sethuraman Panchanathan,et al.  Joint Transfer and Batch-mode Active Learning , 2013, ICML.