Multitask fuzzy Bregman co-clustering approach for clustering data with multisource features

In usual real-world clustering problems, the set of features extracted from the data has two problems which prevent the methods from accurate clustering. First, the features extracted from the samples provide poor information for clustering purpose. Second, the feature vector usually has a high-dimensional multi-source nature, which results in a complex cluster structure in the feature space. In this paper, we propose to use a combination of multi-task clustering and fuzzy co-clustering techniques, to overcome these two problems. In addition, the Bregman divergence is used as the concept of dissimilarity in the proposed algorithm, in order to create a general framework which enables us to use any kind of Bregman distance function, which is consistent with the data distribution and the structure of the clusters. The experimental results indicate that the proposed algorithm can overcome the two mentioned problems, and manages the complexity and weakness of the features, which results in appropriate clustering performances.

[1]  Sebastián Ventura,et al.  Classification via clustering for predicting final marks starting from the student participation in Forums , 2012, EDM.

[2]  Frank Klawonn,et al.  Fuzzy clustering: More than just fuzzification , 2015, Fuzzy Sets Syst..

[3]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[4]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[5]  Wei Cheng,et al.  HICC: an entropy splitting-based framework for hierarchical co-clustering , 2015, Knowledge and Information Systems.

[6]  Jie Zhou,et al.  Multi-task clustering via domain adaptation , 2012, Pattern Recognit..

[7]  Philip S. Yu,et al.  Co-clustering by block value decomposition , 2005, KDD '05.

[8]  Mohamed Nadif,et al.  Hard and fuzzy diagonal co-clustering for document-term partitioning , 2016, Neurocomputing.

[9]  Dong-Hong Ji,et al.  Document clustering based on cluster validation , 2004, CIKM '04.

[10]  Thach Huy Nguyen,et al.  A feature-free and parameter-light multi-task clustering framework , 2012, Knowledge and Information Systems.

[11]  Lawrence Cayton,et al.  Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  Witold Pedrycz,et al.  Agreement-based fuzzy C-means for clustering data with blocks of features , 2014, Neurocomputing.

[14]  Witold Pedrycz,et al.  Semantic Web Content Analysis: A Study in Proximity-Based Collaborative Clustering , 2007, IEEE Transactions on Fuzzy Systems.

[15]  David S. Wishart,et al.  An improved method to detect correct protein folds using partial clustering , 2013, BMC Bioinformatics.

[16]  Boldeanu Silviu,et al.  FUZZY CLUSTERING , 2006 .

[17]  Maoguo Gong,et al.  Fuzzy C-Means Clustering With Local Information and Kernel Metric for Image Segmentation , 2013, IEEE Transactions on Image Processing.

[18]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A framework for bottom-up induction of oblique decision trees , 2014, Neurocomputing.

[19]  Hermann Hellwagner,et al.  Online indexing and clustering of social media data for emergency management , 2016, Neurocomputing.

[20]  Ethem Alpaydin,et al.  Multivariate Statistical Tests for Comparing Classification Algorithms , 2011, LION.

[21]  Xianchao Zhang,et al.  Smart Multi-Task Bregman Clustering and Multi-Task Kernel Clustering , 2013, AAAI.

[22]  Xiang Ji,et al.  Clustering and retrieval of video shots based on natural stimulus fMRI , 2014, Neurocomputing.

[23]  Andrew W. H. Ip,et al.  Customer grouping for better resources allocation using GA based clustering technique , 2012, Expert Syst. Appl..

[24]  Sachindra Joshi,et al.  A matrix density based algorithm to hierarchically co-cluster documents and words , 2003, WWW '03.

[25]  Jianwen Zhang,et al.  Multitask Bregman clustering , 2010, Neurocomputing.

[26]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[27]  Tao Li,et al.  HCC: a hierarchical co-clustering algorithm , 2010, SIGIR '10.

[28]  Yang Yan,et al.  Fuzzy semi-supervised co-clustering for text documents , 2013, Fuzzy Sets Syst..

[29]  Quanquan Gu,et al.  Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[30]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[31]  Mohammad Mehdi Sepehri,et al.  Stores clustering using a data mining approach for distributing automotive spare-parts to reduce transportation costs , 2012, Expert Syst. Appl..

[32]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[33]  William-Chandra Tjhi,et al.  Dual Fuzzy-Possibilistic Coclustering for Categorization of Documents , 2009, IEEE Transactions on Fuzzy Systems.

[34]  Madasu Hanmandlu,et al.  A non-extensive entropy feature and its application to texture classification , 2013, Neurocomputing.

[35]  John C. Wooley,et al.  Ultrafast clustering algorithms for metagenomic sequence analysis , 2012, Briefings Bioinform..

[36]  Yuting Su,et al.  HEp-2 cells Classification via clustered multi-task learning , 2016, Neurocomputing.

[37]  Fang-Xiang Wu,et al.  Discovering biological patterns from short time-series gene expression profiles with integrating PPI data , 2014, Neurocomputing.

[38]  Witold Pedrycz,et al.  Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines , 2012, IEEE Transactions on Fuzzy Systems.

[39]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[40]  George Karypis,et al.  A segment-based approach to clustering multi-topic documents , 2012, Knowledge and Information Systems.

[41]  Qiang Yang,et al.  Self-taught clustering , 2008, ICML '08.

[42]  Witold Pedrycz,et al.  Collaborative fuzzy clustering , 2002, Pattern Recognit. Lett..

[43]  Zhiping Wang,et al.  An new initialization method for fuzzy c-means algorithm , 2008, Fuzzy Optim. Decis. Mak..