Transfer Ordinal Label Learning

Designing a classifier in the absence of labeled data is becoming a common encounter as the acquisition of informative labels is often difficult or expensive, particularly on new uncharted target domains. The feasibility of attaining a reliable classifier for the task of interest is embarked by some in transfer learning, where label information from relevant source domains is considered for complimenting the design process. The core challenge arising from such endeavors, however, is the induction of source sample selection bias, such that the trained classifier has the tendency of steering toward the distribution of the source domain. In addition, this bias is deemed to become more severe on data involving multiple classes. Considering this cue, our interest in this paper is to address such a challenge in the target domain, where ordinal labeled data are unavailable. In contrast to the previous works, we propose a transfer ordinal label learning paradigm to predict the ordinal labels of target unlabeled data by spanning the feasible solution space with ensemble of ordinal classifiers from the multiple relevant source domains. Specifically, the maximum margin criterion is considered here for the construction of the target classifier from an ensemble of source ordinal classifiers. Theoretical analysis and extensive empirical studies on real-world data sets are presented to study the benefits of the proposed method.

[1]  Ivor W. Tsang,et al.  Healing Sample Selection Bias by Source Classifier Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[2]  Qiang Yang,et al.  EigenTransfer: a unified framework for transfer learning , 2009, ICML '09.

[3]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[4]  Stephen P. Boyd,et al.  A minimax theorem with applications to machine learning, signal processing, and finance , 2007, 2007 46th IEEE Conference on Decision and Control.

[5]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[6]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[7]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[8]  Ivor W. Tsang,et al.  Transductive Ordinal Regression , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Deepak S. Turaga,et al.  Cross domain distribution adaptation via kernel mapping , 2009, KDD.

[10]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[11]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[12]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[13]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[14]  Philip S. Yu,et al.  Predictive Modeling with Heterogeneous Sources , 2010, SDM.

[15]  Ling Li,et al.  Reduction from Cost-Sensitive Ordinal Ranking to Weighted Binary Classification , 2012, Neural Computation.

[16]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[17]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[18]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[19]  Masashi Sugiyama,et al.  Input-dependent estimation of generalization error under covariate shift , 2005 .

[20]  María Pérez-Ortiz,et al.  An Experimental Study of Different Ordinal Regression Methods and Measures , 2012, HAIS.

[21]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  ChengXiang Zhai,et al.  A two-stage approach to domain adaptation for statistical classifiers , 2007, CIKM '07.

[23]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[24]  Shaogang Gong,et al.  Quantifying and Transferring Contextual Information in Object Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[26]  Ivor W. Tsang,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Domain Adaptation from Multiple Sources: A Domain- , 2022 .

[27]  Zenglin Xu,et al.  Simple and Efficient Multiple Kernel Learning by Group Lasso , 2010, ICML.

[28]  Ling Li,et al.  Ordinal Regression by Extended Binary Classification , 2006, NIPS.

[29]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[30]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[31]  Lorenzo Bruzzone,et al.  Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jaime S. Cardoso,et al.  Learning to Classify Ordinal Data: The Data Replication Method , 2007, J. Mach. Learn. Res..

[33]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[34]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[35]  F. Vella Estimating Models with Sample Selection Bias: A Survey , 1998 .

[36]  Masashi Sugiyama,et al.  Mixture Regression for Covariate Shift , 2006, NIPS.

[37]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[38]  Masashi Sugiyama,et al.  Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation , 2012 .

[39]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[40]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[41]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[42]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[43]  Ivor W. Tsang,et al.  Predictive Distribution Matching SVM for Multi-domain Learning , 2010, ECML/PKDD.

[44]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[45]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[46]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[47]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[48]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[49]  J. Heckman Sample selection bias as a specification error , 1979 .

[50]  Antonio Torralba,et al.  Transfer Learning by Borrowing Examples for Multiclass Object Detection , 2011, NIPS.

[51]  Ali Farhadi,et al.  Transfer Learning in Sign language , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Gunnar Rätsch,et al.  An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis , 2008, NIPS.

[53]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.