Self-Adapted Multi-Task Clustering

Multi-task clustering improves the clustering performance of each task by transferring knowledge across related tasks. Most existing multi-task clustering methods are based on the ideal assumption that the tasks are completely related. However, in many real applications, the tasks are usually partially related, and brute-force transfer may cause negative effect which degrades the clustering performance. In this paper, we propose a self-adapted multi-task clustering (SAMTC) method which can automatically identify and transfer reusable instances among the tasks, thus avoiding negative transfer. SAMTC begins with an initialization by performing single-task clustering on each task, then executes the following three steps: first, it finds the reusable instances by measuring related clusters with Jensen-Shannon divergence between each pair of tasks, and obtains a pair of possibly related subtasks; second, it estimates the relatedness between each pair of subtasks with kernel mean matching; third, it constructs the similarity matrix for each task by exploiting useful information from the other tasks through instance transfer, and adopts spectral clustering to get the final clustering result. Experimental results on several real data sets show the superiority of the proposed algorithm over traditional single-task clustering methods and existing multitask clustering methods.

[1]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Xianchao Zhang,et al.  Smart Multitask Bregman Clustering and Multitask Kernel Clustering , 2015, ACM Trans. Knowl. Discov. Data.

[4]  Svetha Venkatesh,et al.  Regularized nonnegative shared subspace learning , 2011, Data Mining and Knowledge Discovery.

[5]  Volume 26 , 2002 .

[6]  I. Mazin,et al.  Theory , 1934 .

[7]  Jie Zhou,et al.  Multi-task clustering via domain adaptation , 2012, Pattern Recognit..

[8]  Xiao-Lei Zhang,et al.  Convex Discriminative Multitask Clustering , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jianwen Zhang,et al.  Multitask Bregman clustering , 2010, Neurocomputing.

[10]  Hongtao Lu,et al.  Multi-task co-clustering via nonnegative matrix factorization , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[11]  Quanquan Gu,et al.  Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[12]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[13]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[14]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[15]  Bin Jiang,et al.  Clustering Uncertain Data Based on Probability Distribution Similarity , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[17]  Jiawei Han,et al.  Learning a Kernel for Multi-Task Clustering , 2011, AAAI.

[18]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[20]  Chandan K. Reddy,et al.  Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization , 2014, SDM.

[21]  Dale Schuurmans,et al.  Correcting Covariate Shift with the Frank-Wolfe Algorithm , 2015, IJCAI.

[22]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[23]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[24]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[25]  Xianchao Zhang,et al.  Smart Multi-Task Bregman Clustering and Multi-Task Kernel Clustering , 2013, AAAI.