Tumor clustering based on hybrid cluster ensemble framework

Tumor clustering from bio-molecular data provides a new way to perform cancer class discovery. In this paper, we propose a hybrid fuzzy cluster ensemble framework (HFCEF) for tumor clustering from cancer gene expression data. Compared with traditional cluster ensemble framework, HFCEF integrates both the hard clustering and the soft clustering into the cluster ensemble framework. Specifically, HFCEF first applies the affinity propagation algorithm (AP) to perform clustering on the attribute dimension, and generates a set of subspaces which are used to create a set of new datasets. Then, the fuzzy membership function and the affinity propagation algorithm are adopted to generate a set of fuzzy matrices in the ensemble. Finally, the normalized cut algorithm is served as the consensus function to summarize the set of fuzzy matrices and obtain the final result. The experiments on cancer gene expression profiles shows that the proposed framework works well on bio-molecular data, and provides more robust, stable and accurate results.

[1]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[2]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Zhiwen Yu,et al.  Class Discovery From Gene Expression Data Based on Perturbation and Cluster Ensemble , 2009, IEEE Transactions on NanoBioscience.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Jane You,et al.  SOM 2 CE: Double Self-Organizing Map Based Cluster Ensemble Framework and its Application in Cancer Gene Expression Profiles , 2012, IEA/AIE.

[6]  Tossapon Boongoen,et al.  Link-based cluster ensembles for heterogeneous biological data analysis , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[7]  Adil M. Bagirov,et al.  New algorithms for multi-class cancer diagnosis using tumor gene expression signatures , 2003, Bioinform..

[8]  Jane You,et al.  NG2CE: Double neural gas based cluster ensemble framework , 2012, 2012 7th International Conference on Computer Science & Education (ICCSE).

[9]  Giorgio Valentini,et al.  Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses , 2006, Artif. Intell. Medicine.

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[14]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Yun Yang,et al.  Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations , 2011, IEEE Transactions on Knowledge and Data Engineering.

[16]  Danny Coomans,et al.  Clustering Microarrays with Predictive Weighted Ensembles , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[17]  Giorgio Valentini,et al.  Model order selection for clustered biomolecular data , 2006 .

[18]  Tossapon Boongoen,et al.  LCE: a link-based cluster ensemble method for improved gene expression data analysis , 2010, Bioinform..

[19]  Zhiwen Yu,et al.  Knowledge Based Cluster Ensemble for Cancer Discovery From Biomolecular Data , 2011, IEEE Transactions on NanoBioscience.

[20]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[21]  Roberto Avogadri,et al.  Fuzzy ensemble clustering based on random projections for DNA microarray data analysis , 2009, Artif. Intell. Medicine.

[22]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[23]  Jane You,et al.  SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles , 2012, TCBB.

[24]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[25]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[27]  Anna Fabijańska Normalized cuts and watersheds for image segmentation , 2012 .

[28]  Jane You,et al.  Hybrid cluster ensemble framework based on the random combination of data transformation operators , 2012, Pattern Recognit..

[29]  Han Guoqiang,et al.  SC(3): Triple spectral clustering-based consensus clustering framework for class discovery from cancer gene expression profiles. , 2012, IEEE/ACM transactions on computational biology and bioinformatics.

[30]  Giorgio Valentini,et al.  Clusterv: a tool for assessing the reliability of clusters discovered in DNA microarray data , 2006, Bioinform..

[31]  Selim Mimaroglu,et al.  DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Debashis Ghosh,et al.  Cluster stability scores for microarray data in cancer studies , 2003, BMC Bioinformatics.

[33]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[35]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[36]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[37]  Pritha Mahata,et al.  Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..