Galgo: a bi-objective evolutionary meta-heuristic identifies robust transcriptomic classifiers associated with patient outcome across multiple cancer types

MOTIVATION Statistical and machine learning analyses of tumor transcriptomic profiles offer a powerful resource to gain deeper understanding of tumor subtypes and disease prognosis. Currently prognostic gene expression signatures do not exist for all cancer types, and most developed to date have been optimized for individual tumor types. In Galgo we implement a bi-objective optimization approach that prioritizes gene signature cohesiveness and patient survival in parallel which provides greater power to identify tumor transcriptomic phenotypes strongly associated with patient survival. RESULTS To compare the predictive power of the signatures obtained by Galgo with previously studied subtyping methods, we used a meta-analytic approach testing a total of 35 large population-based transcriptomic biobanks of 4 different cancer types. Galgo-generated colorectal and lung adenocarcinoma signatures were stronger predictors of patient survival compared to published molecular classification schemes. One Galgo-generated breast cancer signature outperformed PAM50, AIMS, SCMGENE, and IntClust subtyping predictors. In high grade serous ovarian cancer, Galgo signatures obtained similar predictive power to a consensus classification method. In all cases, Galgo subtypes reflected enrichment of gene sets related to the hallmarks of the disease, which highlights the biological relevance of the partitions found. AVAILABILTY The open-source R package is available on www.github.com/harpomaxx/galgo. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[3]  John Quackenbush,et al.  A three-gene model to robustly identify breast cancer molecular subtypes. , 2012, Journal of the National Cancer Institute.

[4]  M. Cronin,et al.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. , 2004, The New England journal of medicine.

[5]  Aedín C. Culhane,et al.  survcomp: an R/Bioconductor package for performance assessment and comparison of survival models , 2011, Bioinform..

[6]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[7]  Benjamin Haibe-Kains,et al.  Angiogenic mRNA and microRNA Gene Expression Signature Predicts a Novel Subtype of Serous Ovarian Cancer , 2012, PloS one.

[8]  Peter J. Woolf,et al.  GAGE: generally applicable gene set enrichment for pathway analysis , 2009, BMC Bioinformatics.

[9]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[10]  K. Cibulskis,et al.  Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. , 2012, The Journal of clinical investigation.

[11]  Ian Krop,et al.  Use of Biomarkers to Guide Decisions on Adjuvant Systemic Therapy for Women With Early-Stage Invasive Breast Cancer: American Society of Clinical Oncology Clinical Practice Guideline Focused Update. , 2017, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  Yishai Shimoni,et al.  Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification , 2018, PLoS Comput. Biol..

[13]  H. Johnsen,et al.  Deregulation of MYCN, LIN28B and LET7 in a Molecular Subtype of Aggressive High-Grade Serous Ovarian Cancers , 2011, PloS one.

[14]  Matthew D. Wilkerson,et al.  Differential Pathogenesis of Lung Adenocarcinoma Subtypes Involving Sequence Mutations, Copy Number, Chromosomal Instability, and Methylation , 2012, PloS one.

[15]  P. Royston,et al.  The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt , 2011, Statistics in medicine.

[16]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[17]  M. Kris,et al.  Adjuvant Systemic Therapy and Adjuvant Radiation Therapy for Stage I to IIIA Completely Resected Non-Small-Cell Lung Cancers: American Society of Clinical Oncology/Cancer Care Ontario Clinical Practice Guideline Update. , 2017, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[18]  Andreas Makris,et al.  Clinical utility of gene-expression signatures in early stage breast cancer , 2017, Nature Reviews Clinical Oncology.

[19]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[20]  Tom Ronan,et al.  Avoiding common pitfalls when clustering biological data , 2016, Science Signaling.

[21]  R. Greil,et al.  A New Molecular Predictor of Distant Recurrence in ER-Positive, HER2-Negative Breast Cancer Adds Independent Information to Conventional Clinical Risk Factors , 2011, Clinical Cancer Research.

[22]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[23]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[24]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[25]  Thomas Jansen,et al.  Analysis of an Asymmetric Mutation Operator , 2010, Evolutionary Computation.

[26]  M. Hallett,et al.  Absolute assignment of breast cancer intrinsic molecular subtype. , 2015, Journal of the National Cancer Institute.

[27]  Adam A. Margolin,et al.  Systematic Analysis of Challenge-Driven Improvements in Molecular Prognostic Models for Breast Cancer , 2013, Science Translational Medicine.

[28]  Jeanette J McCarthy,et al.  Genomic Medicine: A Decade of Successes, Challenges, and Opportunities , 2013, Science Translational Medicine.

[29]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[30]  Simen Myhre,et al.  The importance of gene-centring microarray data. , 2010, The Lancet. Oncology.

[31]  Gregory M. Chen,et al.  Consensus on Molecular Subtypes of High-Grade Serous Ovarian Carcinoma , 2018, Clinical Cancer Research.

[32]  Richard M. Simon,et al.  Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data , 2011, Briefings Bioinform..

[33]  Nofisat Ismaila,et al.  Use of Biomarkers to Guide Decisions on Adjuvant Systemic Therapy for Women With Early-Stage Invasive Breast Cancer: American Society of Clinical Oncology Clinical Practice Guideline Summary. , 2016, Journal of oncology practice.

[34]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[35]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[36]  E. Goode,et al.  Prognostic and therapeutic relevance of molecular subtypes in high-grade serous ovarian cancer. , 2014, Journal of the National Cancer Institute.

[37]  Joel S. Parker,et al.  Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer , 2016, Bioinform..

[38]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[39]  Prasad Patil,et al.  Test set bias affects reproducibility of gene signatures , 2015, Bioinform..

[40]  Jeffrey S. Morris,et al.  The Consensus Molecular Subtypes of Colorectal Cancer , 2015, Nature Medicine.

[41]  David Venet,et al.  Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome , 2011, PLoS Comput. Biol..

[42]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[43]  L. Murphy,et al.  Genes and functions from breast cancer signatures , 2018, BMC Cancer.