Smart Multitask Bregman Clustering and Multitask Kernel Clustering

Traditional clustering algorithms deal with a single clustering task on a single dataset. However, there are many related tasks in the real world, which motivates multitask clustering. Recently some multitask clustering algorithms have been proposed, and among them multitask Bregman clustering (MBC) is a very applicable method. MBC alternatively updates clusters and learns relationships between clusters of different tasks, and the two phases boost each other. However, the boosting does not always have positive effects on improving the clustering performance, it may also cause negative effects. Another issue of MBC is that it cannot deal with nonlinear separable data. In this article, we show that in MBC, the process of using cluster relationship to boost the cluster updating phase may cause negative effects, that is, cluster centroids may be skewed under some conditions. We propose a smart multitask Bregman clustering (S-MBC) algorithm which can identify the negative effects of the boosting and avoid the negative effects if they occur. We then propose a multitask kernel clustering (MKC) framework for nonlinear separable data by using a similar framework like MBC in the kernel space. We also propose a specific optimization method, which is quite different from that of MBC, to implement the MKC framework. Since MKC can also cause negative effects like MBC, we further extend the framework of MKC to a smart multitask kernel clustering (S-MKC) framework in a similar way that S-MBC is extended from MBC. We conduct experiments on 10 real world multitask clustering datasets to evaluate the performance of S-MBC and S-MKC. The results on clustering accuracy show that: (1) compared with the original MBC algorithm MBC, S-MBC and S-MKC perform much better; (2) compared with the convex discriminative multitask relationship clustering (DMTRC) algorithms DMTRC-L and DMTRC-R which also avoid negative transfer, S-MBC and S-MKC perform worse in the (ideal) case in which different tasks have the same cluster number and the empirical label marginal distribution in each task distributes evenly, but better or comparable in other (more general) cases. Moreover, S-MBC and S-MKC can work on the datasets in which different tasks have different number of clusters, violating the assumptions of DMTRC-L and DMTRC-R. The results on efficiency show that S-MBC and S-MKC consume more computational time than MBC and less computational time than DMTRC-L and DMTRC-R. Overall S-MBC and S-MKC are competitive compared with the state-of-the-art multitask clustering algorithms in synthetical terms of accuracy, efficiency and applicability.

[1]  Jie Zhou,et al.  Multi-task clustering via domain adaptation , 2012, Pattern Recognit..

[2]  Jianwen Zhang,et al.  Multitask Bregman clustering , 2010, Neurocomputing.

[3]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[4]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[5]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[6]  Alexander J. Smola,et al.  Support Vector Machine Reference Manual , 1998 .

[7]  Hongliang Fei,et al.  Structured feature selection and task relationship inference for multi-task learning , 2011, 2011 IEEE 11th International Conference on Data Mining.

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[10]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[11]  Korris Fu-Lai Chung,et al.  Transfer Spectral Clustering , 2012, ECML/PKDD.

[12]  Qiang Yang,et al.  Spectral domain-transfer learning , 2008, KDD.

[13]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[14]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[15]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[16]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[17]  Thach Huy Nguyen,et al.  A feature-free and parameter-light multi-task clustering framework , 2012, Knowledge and Information Systems.

[18]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[19]  Jiawei Han,et al.  Learning a Kernel for Multi-Task Clustering , 2011, AAAI.

[20]  Thach Huy Nguyen,et al.  A Compression-Based Dissimilarity Measure for Multi-task Clustering , 2011, ISMIS.

[21]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[22]  Xiao-Lei Zhang,et al.  Convex Discriminative Multitask Clustering , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[24]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[25]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[26]  Ramesh Nallapati,et al.  A Comparative Study of Methods for Transductive Transfer Learning , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[27]  Dit-Yan Yeung,et al.  A Regularization Approach to Learning Task Relationships in Multitask Learning , 2014, ACM Trans. Knowl. Discov. Data.

[28]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[29]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[30]  Avishek Saha,et al.  Online Learning of Multiple Tasks and Their Relationships , 2011, AISTATS.

[31]  Joydeep Ghosh,et al.  A Unified Framework for Model-based Clustering , 2003, J. Mach. Learn. Res..

[32]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[34]  Xianchao Zhang,et al.  Smart Multi-Task Bregman Clustering and Multi-Task Kernel Clustering , 2013, AAAI.

[35]  Hongtao Lu,et al.  Multi-task co-clustering via nonnegative matrix factorization , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[36]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[37]  Quanquan Gu,et al.  Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[38]  Qiang Yang,et al.  Self-taught clustering , 2008, ICML '08.

[39]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[40]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[41]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[42]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[43]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[44]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[45]  Hongliang Fei,et al.  Structured Feature Selection and Task Relationship Inference for Multi-task Learning , 2011, ICDM.

[46]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[47]  Dit-Yan Yeung,et al.  Multi-Task Boosting by Exploiting Task Relationships , 2012, ECML/PKDD.

[48]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[49]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[50]  Massimiliano Pontil,et al.  Exploiting Unrelated Tasks in Multi-Task Learning , 2012, AISTATS.

[51]  Yin Zhang,et al.  Solving large-scale linear programs by interior-point methods under the Matlab ∗ Environment † , 1998 .

[52]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[53]  Dit-Yan Yeung,et al.  Transfer Metric Learning with Semi-Supervised Extension , 2012, TIST.