Cluster-than-Label: Semi-Supervised Approach for Domain Adaptation

The performance of a conventional machine learning the model trained on a source domain degrades poorly when they are tested on a different data distribution (target domain). These traditional models deal with this problem by training a new paradigm for the particular different data distribution (target domain). Therefore, training of a new paradigm forthe individual data distribution is computationally expensive. This paper demonstrates that how to adapt to a new data distribution (target domain), utilising the model trained on the source domain and avoiding the cost of re-training and the need for access to the source labelled data. In particular, we introduce an Efficient Semi-supervised Cluster-than-Label Cross-domain Adaptation Algorithm (SCTLCDA) to address the cross-domain adaptation classification problem in which we utilised both labelled and unlabelled data samples in the target domain, as well as completely unlabelled data samples in the source domain. Subsequently, we also describe that our proposed method can manage large datasets and easily lead to cross-domain adaptation problem. The effectiveness and performance of our method are confirmed by experiments on two real-world applications: Cross-domain sentiments and Web-Spam classification problem.

[1]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[2]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[3]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[4]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[5]  Philip S. Yu,et al.  Transfer Joint Matching for Unsupervised Domain Adaptation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ivor W. Tsang,et al.  Learning with Augmented Features for Heterogeneous Domain Adaptation , 2012, ICML.

[7]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[8]  Weixin Xie,et al.  Suppressed fuzzy c-means clustering algorithm , 2003, Pattern Recognit. Lett..

[9]  Xiaojin Zhu,et al.  Keepin’ It Real: Semi-Supervised Learning with Realistic Tuning , 2009 .

[10]  John Blitzer,et al.  Co-Training for Domain Adaptation , 2011, NIPS.

[11]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[12]  Kristen Grauman,et al.  Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.

[13]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[14]  Daumé,et al.  Frustratingly Easy Semi-Supervised Domain Adaptation , 2010 .

[15]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[16]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[17]  Michael K. Ng,et al.  Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[18]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[19]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Erik G. Learned-Miller,et al.  Online domain adaptation of a pre-trained cascade of classifiers , 2011, CVPR 2011.

[21]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[22]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[23]  Brian C. Lovell,et al.  Domain Adaptation on the Statistical Manifold , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yuan Shi,et al.  Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation , 2012, ICML.

[25]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.