Dissecting cancer heterogeneity based on dimension reduction of transcriptomic profiles using extreme learning machines

It is becoming increasingly clear that major malignancies such as breast, colorectal and gastric cancers are not single disease entities, but comprising multiple cancer subtypes of distinct molecular properties. Molecular subtyping has been widely used to dissect inter-tumor biological heterogeneity, in relation to clinical outcomes. A key step of this methodology is to perform unsupervised classification of gene expression profiles, which, however, often suffers challenges of high-dimensionality, feature redundancy as well as noise and irrelevant information. To overcome these limitations, we propose ELM-CC, which employs hidden observation features obtained from extreme learning machines (ELMs) for cancer classification. To demonstrate the effectiveness and usefulness, we applied ELM-CC for gastric and ovarian cancer subtyping. Comparing with the widely-used consensus clustering method, our approach demonstrated much better clustering performance and identified molecular subtypes that are much more clinically relevant.

[1]  Xiaohua Hu,et al.  Microarray Gene Cluster Identification and Annotation Through Cluster Ensemble and EM-Based Informative Textual Summarization , 2009, IEEE Transactions on Information Technology in Biomedicine.

[2]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[3]  T. Tong,et al.  Cancer statistics, 1994 , 1994, CA: a cancer journal for clinicians.

[4]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[5]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[7]  F. Giroud,et al.  Adaptive Dissimilarity Index for Gene Expression Profiles Classification , 2007 .

[8]  E. Goode,et al.  Prognostic and therapeutic relevance of molecular subtypes in high-grade serous ovarian cancer. , 2014, Journal of the National Cancer Institute.

[9]  Insuk Sohn,et al.  Nanostring-Based Multigene Assay to Predict Recurrence for Gastric Cancer Patients after Surgery , 2014, PloS one.

[10]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[11]  N. Dubrawsky Cancer statistics , 1989, CA: a cancer journal for clinicians.

[12]  O. Elemento,et al.  DNA methylation signatures define molecular subtypes of diffuse large B-cell lymphoma. , 2010, Blood.

[13]  Nan Lin,et al.  Information criterion-based clustering with order-restricted candidate profiles in short time-course microarray experiments , 2009, BMC Bioinformatics.

[14]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[15]  A. Nobel,et al.  Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data , 2008 .

[16]  John M. Hancock Self‐Organizing Map (SOM, Kohonen Map) , 2014 .

[17]  Dirk Troost,et al.  Integrated Genomics Identifies Five Medulloblastoma Subtypes with Distinct Genetic Profiles, Pathway Signatures and Clinicopathological Features , 2008, PloS one.

[18]  L. Vermeulen,et al.  Colorectal cancer heterogeneity and targeted therapy: a case for molecular disease subtypes. , 2015, Cancer research.

[19]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[20]  Huadong Liu CLUSTERING: HIERARCHICAL ALGORITHMS , 2006 .

[21]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[22]  T. Curran,et al.  Genomics identifies medulloblastoma subgroups that are enriched for specific genetic alterations. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[23]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[24]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[25]  Matthew D. Wilkerson,et al.  ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking , 2010, Bioinform..

[26]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[27]  Agnes Vathy-Fogarassy,et al.  Graph-Based Clustering Algorithms , 2013 .

[28]  John M. Hancock Self‐Organizing Map (Kohonen Map, SOM) , 2004 .

[29]  Jeffrey S. Morris,et al.  The Consensus Molecular Subtypes of Colorectal Cancer , 2015, Nature Medicine.

[30]  Laurent Ozbun,et al.  A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer. , 2008, Cancer research.

[31]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of gastric adenocarcinoma , 2014, Nature.

[32]  A. Jemal,et al.  Global Cancer Statistics , 2011 .

[33]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[34]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[35]  Florian Markowetz,et al.  Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions , 2013, Nature Medicine.

[36]  Jason G. Jin,et al.  Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes , 2015, Nature Medicine.

[37]  L. Vermeulen,et al.  Cancer heterogeneity—a multifaceted view , 2013, EMBO reports.

[38]  Xin Wang,et al.  Dissecting cancer heterogeneity--an unsupervised classification approach. , 2013, The international journal of biochemistry & cell biology.

[39]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .