Multitask possibilistic and fuzzy co-clustering algorithm for clustering data with multisource features

People often encounter two major problems for the practical clustering problems. One is the problem arising from improper extraction of feature sets, such as the weakness of the features and the feature vector usually has the property of high-dimensional and multisource. The other is that the outliers interfere with the clustering results. In this paper, we use the idea of co-clustering to cluster datasets and feature sources at the same time, and use the information which received from the information sharing between tasks to improve the accuracy of clustering tasks through the idea of multitask. And we used the advantage of the typical degree to construct a new parameter selection index to identify the outliers, and to correct each parameter by weakening the influence of the identified outliers on the clustering results. In order to reflect the applicability and robustness of the algorithm, we extend the algorithm to the non-precise dataset and evaluate the algorithm from multiple aspects through experiments. Experiments show that the proposed algorithms not only improve the clustering accuracy, but also greatly reduce the interference of outliers to clustering results.

[1]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  James C. Bezdek,et al.  Generalized fuzzy c-means clustering strategies using Lp norm distances , 2000, IEEE Trans. Fuzzy Syst..

[3]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[4]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[5]  Yang Yan,et al.  Fuzzy semi-supervised co-clustering for text documents , 2013, Fuzzy Sets Syst..

[6]  Jianwen Zhang,et al.  Multitask Bregman clustering , 2010, Neurocomputing.

[7]  Thach Huy Nguyen,et al.  A feature-free and parameter-light multi-task clustering framework , 2012, Knowledge and Information Systems.

[8]  Witold Pedrycz,et al.  Agreement-based fuzzy C-means for clustering data with blocks of features , 2014, Neurocomputing.

[9]  R. Kruse,et al.  An extension to possibilistic fuzzy cluster analysis , 2004, Fuzzy Sets Syst..

[10]  Kuo-Lung Wu,et al.  Unsupervised possibilistic clustering , 2006, Pattern Recognit..

[11]  Witold Pedrycz,et al.  Granular Fuzzy Possibilistic C-Means Clustering approach to DNA microarray problem , 2017, Knowl. Based Syst..

[12]  Miin-Shen Yang,et al.  Fuzzy clustering on LR-type fuzzy numbers with an application in Taiwanese tea evaluation , 2005, Fuzzy Sets Syst..

[13]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[14]  Sachindra Joshi,et al.  A matrix density based algorithm to hierarchically co-cluster documents and words , 2003, WWW '03.

[15]  Chee Peng Lim,et al.  Application of fuzzy ARTMAP and fuzzy c-means clustering to pattern classification with incomplete data , 2005, Neural Computing & Applications.

[16]  Jing Lu,et al.  Semi-supervised fuzzy clustering: A kernel-based approach , 2009, Knowl. Based Syst..

[17]  Witold Pedrycz,et al.  Semantic Web Content Analysis: A Study in Proximity-Based Collaborative Clustering , 2007, IEEE Transactions on Fuzzy Systems.

[18]  Dingcheng Li,et al.  Spectral co-clustering ensemble , 2015, Knowl. Based Syst..

[19]  Mohammad Hossein Fazel Zarandi,et al.  A Fuzzy Clustering Model for Fuzzy Data with Outliers , 2010, Int. J. Fuzzy Syst. Appl..

[20]  Springer-Verlag London Limited A multi-task framework for metric learning with common subspace , 2013 .

[21]  Jie Tang,et al.  Predicting individual retweet behavior by user similarity: A multi-task learning approach , 2015, Knowl. Based Syst..

[22]  Xianchao Zhang,et al.  Smart Multi-Task Bregman Clustering and Multi-Task Kernel Clustering , 2013, AAAI.

[23]  Witold Pedrycz,et al.  Collaborative fuzzy clustering , 2002, Pattern Recognit. Lett..

[24]  Korris Fu-Lai Chung,et al.  An enhanced possibilistic C-Means clustering algorithm EPCM , 2008, Soft Comput..

[25]  Witold Pedrycz,et al.  Two nonparametric models for fusing heterogeneous fuzzy data , 1998, IEEE Trans. Fuzzy Syst..

[26]  Wen-Liang Hung,et al.  Automatic clustering algorithm for fuzzy data , 2015 .

[27]  Paolo Giordani,et al.  Possibilistic and fuzzy clustering methods for robust analysis of non-precise data , 2017, Int. J. Approx. Reason..

[28]  Li Li,et al.  Maximum relevance minimum common redundancy feature selection for nonlinear data , 2017, Inf. Sci..

[29]  William-Chandra Tjhi,et al.  Dual Fuzzy-Possibilistic Coclustering for Categorization of Documents , 2009, IEEE Transactions on Fuzzy Systems.

[30]  Sadaaki Miyamoto,et al.  On tolerant fuzzy c-means clustering and tolerant possibilistic clustering , 2010, Soft Comput..

[31]  Mohamed Nadif,et al.  Hard and fuzzy diagonal co-clustering for document-term partitioning , 2016, Neurocomputing.

[32]  Tommy W. S. Chow,et al.  Organizing Books and Authors by Multilayer SOM , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Peyman Adibi,et al.  Multitask fuzzy Bregman co-clustering approach for clustering data with multisource features , 2017, Neurocomputing.

[34]  Pierpaolo D'Urso,et al.  Fuzzy and possibilistic clustering for fuzzy data , 2012, Comput. Stat. Data Anal..

[35]  Miin-Shen Yang,et al.  On a class of fuzzy c-numbers clustering procedures for fuzzy data , 1996, Fuzzy Sets Syst..

[36]  S. R. Kannan,et al.  Effective Fuzzy Possibilistic C-Means: An Analyzing Cancer Medical Database , 2015 .

[37]  MiaoQiguang,et al.  Predicting individual retweet behavior by user similarity , 2015 .

[38]  Andrew Chi-Sing Leung,et al.  PSO-based K-Means clustering with enhanced cluster matching for gene expression data , 2012, Neural Computing and Applications.

[39]  M. L. Valarmathi,et al.  SAR image despeckling using possibilistic fuzzy C-means clustering and edge detection in bandelet domain , 2013, Neural Computing and Applications.

[40]  Thierry Denoeux,et al.  Clustering Fuzzy Data Using the Fuzzy EM Algorithm , 2010, SUM.

[41]  Jie Zhou,et al.  Multi-task clustering via domain adaptation , 2012, Pattern Recognit..

[42]  Yating Hu,et al.  Unsupervised Possibilistic Clustering Based on Kernel Methods , 2012 .

[43]  Witold Pedrycz,et al.  Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines , 2012, IEEE Transactions on Fuzzy Systems.

[44]  Tommy W. S. Chow,et al.  Tree2Vector: Learning a Vectorial Representation for Tree-Structured Data , 2018, IEEE Transactions on Neural Networks and Learning Systems.