New cluster ensemble approach to integrative biological data analysis

Clinical data has been employed as the major factor for traditional cancer prognosis. However, this classic approach may be ineffective for analysing morphologically indistinguishable tumour subtypes. As such, microarray technology emerges as the promising alternative. Despite a large number of microarray studies, the actual clinical application of gene expression data analysis remains limited owing to the complexity of generated data and the noise level. Recently, the integrative cluster analysis of both clinical and gene expression data has been shown to be an effective alternative to overcome the above-mentioned problems. This paper presents a novel method for using cluster ensembles that is accurate for analysing heterogeneous biological data. Evaluation against real biological and benchmark data sets suggests that the quality of the proposed model is higher than many state-of-the-art cluster ensemble techniques and standard clustering algorithms.

[1]  Tossapon Boongoen,et al.  Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations , 2008, Discovery Science.

[2]  Tolga Can,et al.  Bi-k-bi clustering: mining large scale gene expression data using two-level biclustering , 2010, Int. J. Data Min. Bioinform..

[3]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[4]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[5]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[8]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[9]  Yixiao Li,et al.  Clustering Mixed Data Based on Evidence Accumulation , 2006, ADMA.

[10]  Aristides Gionis,et al.  Clustering Aggregation (long version) , 2007 .

[11]  Philip S. Yu,et al.  WF-MSB: A weighted fuzzy-based biclustering method for gene expression data , 2011, Int. J. Data Min. Bioinform..

[12]  Debashis Ghosh,et al.  Cluster stability scores for microarray data in cancer studies , 2003, BMC Bioinformatics.

[13]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Arie Perry,et al.  Mantel statistics to correlate gene expression levels from microarrays with clinical covariates , 2002, Genetic epidemiology.

[16]  Geoffrey J. McLachlan,et al.  Integrative mixture of experts to combine clinical factors and gene markers , 2010, Bioinform..

[17]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[18]  Shinichi Morishita,et al.  Constrained clusters of gene expression profiles with pathological features , 2004, Bioinform..

[19]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[20]  Weihui Dai,et al.  K-Centers Algorithm for Clustering Mixed Type Data , 2007, PAKDD.

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  Zengyou He,et al.  Scalable algorithms for clustering large datasets with mixed type attributes , 2005, Int. J. Intell. Syst..

[23]  Pierre R. Bushel,et al.  Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes , 2007, BMC Systems Biology.

[24]  Marco Masseroli,et al.  Data Mining Techniques for the Identification of Genes with Expression Levels Related to Breast Cancer Prognosis , 2009, 2009 Ninth IEEE International Conference on Bioinformatics and BioEngineering.

[25]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[26]  Kikuya Kato,et al.  Adaptor-tagged competitive PCR: a novel method for measuring relative gene expression. , 1997, Nucleic acids research.

[27]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[28]  Barbara J. Wold,et al.  Mining gene expression data by interpreting principal components , 2006, BMC Bioinformatics.

[29]  David James Sherman,et al.  Family relationships: should consensus reign? - consensus clustering for protein families , 2007, Bioinform..

[30]  Dustin P. Potter,et al.  Heritable clustering and pathway discovery in breast cancer integrating epigenetic and phenotypic data , 2007, BMC Bioinformatics.

[31]  Tossapon Boongoen,et al.  LCE: a link-based cluster ensemble method for improved gene expression data analysis , 2010, Bioinform..

[32]  H. Ralambondrainy,et al.  A conceptual version of the K-means algorithm , 1995, Pattern Recognit. Lett..

[33]  Lee Bennett,et al.  Gene expression analysis reveals chemical-specific profiles. , 2002, Toxicological sciences : an official journal of the Society of Toxicology.

[34]  Dario Campana,et al.  Integrated analysis of pharmacologic, clinical and SNP microarray data using Projection Onto the Most Interesting Statistical Evidence with Adaptive Permutation Testing , 2011, Int. J. Data Min. Bioinform..

[35]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[36]  Graham R. Ball,et al.  Identification of gene transcript signatures predictive for estrogen receptor and lymph node status using a stepwise forward selection artificial neural network modelling approach , 2008, Artif. Intell. Medicine.