Multi-task clustering via domain adaptation

Clustering is a fundamental topic in pattern recognition and machine learning research. Traditional clustering methods deal with a single clustering task on a single data set. However, in many real applications, multiple similar clustering tasks are involved simultaneously, e.g., clustering clients of different shopping websites, in which data of different subjects are collected for each task. These tasks are cross-domains but closely related. It is proved that we can improve the individual performance of each clustering task by appropriately utilizing the underling relation. In this paper, we will propose a new approach, which performs multiple related clustering tasks simultaneously through domain adaptation. A shared subspace will be learned through domain adaptation, where the gap of distributions among tasks is reduced, and the shared knowledge will be transferred through all tasks by exploiting the strengthened relation in the learned subspace. Then the object is set as the best clustering in both the original and learned spaces. An alternating optimization method is introduced and its convergence is theoretically guaranteed. Experiments on both synthetic and real data sets demonstrate the effectiveness of the proposed approach.

[1]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[2]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[3]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[4]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[5]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[6]  Ivor W. Tsang,et al.  Extracting discriminative concepts for domain adaptation in text mining , 2009, KDD.

[7]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[8]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  David B. Dunson,et al.  Multi-Task Learning for Analyzing and Sorting Large Databases of Sequential Data , 2008, IEEE Transactions on Signal Processing.

[10]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[13]  D. Dunson MULTIVARIATE KERNEL PARTITION PROCESS MIXTURES. , 2010, Statistica Sinica.

[14]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[15]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[16]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, Machine Learning.

[17]  Jianwen Zhang,et al.  Multitask Bregman clustering , 2010, Neurocomputing.

[18]  Sunita Sarawagi,et al.  Domain Adaptation of Conditional Probability Models Via Feature Subsetting , 2007, PKDD.

[19]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[21]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[22]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[23]  David B. Dunson,et al.  Multi-task classification with infinite local experts , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[25]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[26]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[27]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[28]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[29]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[30]  D. Dunson Nonparametric Bayes local partition models for random effects. , 2009, Biometrika.

[31]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[32]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[33]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[34]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[35]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[36]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[37]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[38]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[39]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[40]  Quanquan Gu,et al.  Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.