Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification

There are many clustering tasks which are closely related in the real world, e.g. clustering the web pages of different universities. However, existing clustering approaches neglect the underlying relation and treat these clustering tasks either individually or simply together. In this paper, we will study a novel clustering paradigm, namely multi-task clustering, which performs multiple related clustering tasks together and utilizes the relation of these tasks to enhance the clustering performance. We aim to learn a subspace shared by all the tasks, through which the knowledge of the tasks can be transferred to each other. The objective of our approach consists of two parts: (1) Within-task clustering: clustering the data of each task in its input space individually; and (2) Cross-task clustering: simultaneous learning the shared subspace and clustering the data of all the tasks together. We will show that it can be solved by alternating minimization, and its convergence is theoretically guaranteed. Furthermore, we will show that given the labels of one task, our multi-task clustering method can be extended to transductive transfer classification (a.k.a. cross-domain classification, domain adaption). Experiments on several cross-domain text data sets demonstrate that the proposed multi-task clustering outperforms traditional single-task clustering methods greatly. And the transductive transfer classification method is comparable to or even better than several existing transductive transfer classification approaches.

[1]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[4]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[5]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[6]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[7]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[8]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[9]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[10]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[11]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[12]  Yizhou Sun,et al.  Heterogeneous source consensus learning via decision propagation and negotiation , 2009, KDD.

[13]  Quanquan Gu,et al.  Local Learning Regularized Nonnegative Matrix Factorization , 2009, IJCAI.

[14]  Tao Li,et al.  Document clustering via adaptive subspace iteration , 2004, SIGIR '04.

[15]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[17]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[18]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[19]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Qiang Yang,et al.  Self-taught clustering , 2008, ICML '08.

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[25]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[26]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[27]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[28]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[29]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[30]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[31]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[32]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[33]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[34]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[35]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[36]  Qiang Yang,et al.  Spectral domain-transfer learning , 2008, KDD.

[37]  Qiang Yang,et al.  EigenTransfer: a unified framework for transfer learning , 2009, ICML '09.

[38]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[39]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.