A new link-based method to ensemble clustering and cancer microarray data analysis

Ensemble clustering or cluster ensembles have been shown to be better than any standard clustering algorithm at improving accuracy. This meta-learning formalism helps users to overcome the dilemma of selecting an appropriate technique and the parameters for that technique, given a set of data. It has proven effective for many problem domains, especially microarray data analysis. Among different state-of-the-art methods, the link-based approach (LCE) recently introduced by Iam-On et al. (2011) provides a highly accurate clustering. This paper presents the improvement of LCE with a new link-based metric being developed and engaged. Additional information that is already available in a network is included in the similarity assessment. As such, this refinement can increase the quality of the measures, hence the resulting cluster decision. The performance of this improved LCE is evaluated on synthetic and UCI benchmark datasets, in comparison with the original and several well-known cluster ensemble techniques...

[1]  J.A.F. Costa,et al.  Cluster analysis using self-organizing maps and image processing techniques , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[2]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[3]  Rainer Spang,et al.  Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine. , 2003, Drug discovery today.

[4]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Tossapon Boongoen,et al.  Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations , 2008, Discovery Science.

[6]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[8]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[9]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Daniel A. Ashlock,et al.  MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering , 2009, BMC Bioinformatics.

[11]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[14]  Ruey-Shun Chen,et al.  Data Mining Application in Customer Relationship Management of Credit Card Business , 2005, COMPSAC.

[15]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[16]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[17]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[18]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[19]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[20]  Thomas H. Wonnacott,et al.  Introductory Statistics , 2007, Technometrics.

[21]  Tossapon Boongoen,et al.  Disclosing false identity through hybrid link analysis , 2010, Artificial Intelligence and Law.

[22]  Mari Ostendorf,et al.  Combining Multiple Clustering Systems , 2004, PKDD.

[23]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Xiaohua Hu,et al.  Cluster Ensemble and Its Applications in Gene Expression Analysis , 2004, APBC.

[26]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[27]  Amit Konar,et al.  Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm , 2008, Pattern Recognit. Lett..

[28]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[29]  Qiang Yang,et al.  Discriminatively regularized least-squares classification , 2009, Pattern Recognit..

[30]  Jitender S. Deogun,et al.  Conceptual clustering in information retrieval , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[31]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[33]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[34]  Javed Mostafa,et al.  Information Retrieval by Semantic Analysis and Visualization of the Concept Space of D-Lib Magazine , 2002, D Lib Mag..

[35]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[36]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[37]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[38]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[39]  Tossapon Boongoen,et al.  LCE: a link-based cluster ensemble method for improved gene expression data analysis , 2010, Bioinform..

[40]  Kyoung-jae Kim,et al.  A recommender system using GA K-means clustering in an online shopping market , 2008, Expert Syst. Appl..

[41]  Tossapon Boongoen,et al.  Extending Data Reliability Measure to a Filter Approach for Soft Subspace Clustering , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Mohamed S. Kamel,et al.  Finding Natural Clusters Using Multi-clusterer Combiner Based on Shared Nearest Neighbors , 2003, Multiple Classifier Systems.