Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling

Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.

[1]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[2]  Therese Sørlie,et al.  Presence of bone marrow micrometastasis is associated with different recurrence risk within molecular subtypes of breast cancer , 2007, Molecular oncology.

[3]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[4]  Chris Sander,et al.  CancerGenes: a gene selection resource for cancer genome projects , 2006, Nucleic Acids Res..

[5]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  James Bennett,et al.  The Netflix Prize , 2007 .

[7]  I. Ellis,et al.  Pathological prognostic factors in breast cancer. , 1999, Critical reviews in oncology/hematology.

[8]  David Haussler,et al.  Integrated molecular profiles of invasive breast tumors and ductal carcinoma in situ (DCIS) reveal differential vascular and interleukin signaling , 2011, Proceedings of the National Academy of Sciences.

[9]  Jaime Prilusky,et al.  Assessment of CASP8 structure predictions for template free targets , 2009, Proteins.

[10]  I. Ellis,et al.  A gene-expression signature to predict survival in breast cancer across independent data sets , 2007, Oncogene.

[11]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[12]  S. Friend,et al.  Developing predictive molecular maps of human disease through community-based modeling , 2011, Nature Genetics.

[13]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[14]  M. Peitsch,et al.  Verification of systems biology research in the age of collaborative competition , 2011, Nature Biotechnology.

[15]  Prasanna R Kolatkar,et al.  Assessment of CASP7 structure predictions for template free targets , 2007, Proteins.

[16]  Kakajan Komurov,et al.  Core epithelial-to-mesenchymal transition interactome gene-expression signature is associated with claudin-low and metaplastic breast cancer subtypes , 2010, Proceedings of the National Academy of Sciences.

[17]  M. Emmert-Buck,et al.  Current molecular diagnostics of breast cancer and the potential incorporation of microRNA , 2009, Expert review of molecular diagnostics.

[18]  J. Haerting,et al.  Gene-expression signatures in breast cancer. , 2003, The New England journal of medicine.

[19]  C. Sotiriou,et al.  Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures , 2007, Breast Cancer Research.

[20]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[21]  Nedjeljko Frančula The National Academies Press , 2013 .

[22]  I. Ellis,et al.  Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. , 2002, Histopathology.

[23]  J. Pollack,et al.  Genomic instability in breast cancer: Pathogenesis and clinical implications , 2010, Molecular oncology.

[24]  N. D. Clarke,et al.  Correction: Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PLoS ONE.

[25]  M. Cronin,et al.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. , 2004, The New England journal of medicine.

[26]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[27]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[28]  Maqc Consortium The MicroArray Quality Control ( MAQC )-II study of common practices for the development and validation of microarray-based predictive models , 2012 .

[29]  Shoshana J Wodak,et al.  Prediction of protein-protein interactions: the CAPRI experiment, its evaluation and implications. , 2004, Current opinion in structural biology.

[30]  Eric Lonstein,et al.  Prize-based contests can provide solutions to computational biology problems , 2013, Nature Biotechnology.

[31]  T G Clark,et al.  Survival Analysis Part I: Basic concepts and first analyses , 2003, British Journal of Cancer.

[32]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[33]  A. Regev,et al.  An embryonic stem cell–like gene expression signature in poorly differentiated aggressive human tumors , 2008, Nature Genetics.

[34]  R. Scharpf,et al.  A multilevel model to address batch effects in copy number estimation using SNP arrays. , 2011, Biostatistics.

[35]  L. V. van't Veer,et al.  Clinical application of the 70-gene profile: the MINDACT trial. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[36]  Weiqiang Dong On Bias , Variance , 0 / 1-Loss , and the Curse of Dimensionality RK April 13 , 2014 .

[37]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[38]  J Moult,et al.  The current state of the art in protein structure prediction. , 1996, Current opinion in biotechnology.

[39]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Judy Robertson,et al.  The Netflix prize, computer science outreach, and Japanese mobile phones , 2009, CACM.

[41]  Ajay K. Royyuru,et al.  Industrial methodology for process verification in research (IMPROVER): toward systems biology verification , 2012, Bioinform..

[42]  Julio Saez-Rodriguez,et al.  Crowdsourcing Network Inference: The DREAM Predictive Signaling Network Challenge , 2011, Science Signaling.

[43]  Jason I. Herschkowitz,et al.  Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer , 2010, Breast Cancer Research.

[44]  G. Glinsky,et al.  Microarray analysis identifies a death-from-cancer signature predicting therapy failure in patients with multiple types of cancer. , 2005, The Journal of clinical investigation.

[45]  Rob J Hyndman,et al.  The value of feedback in forecasting competitions , 2011 .

[46]  A. Børresen-Dale,et al.  The landscape of cancer genes and mutational processes in breast cancer , 2012, Nature.

[47]  Andy Oram,et al.  ACM Content Wants to Be Free , 2009 .

[48]  Israel Steinfeld,et al.  miRNA-mRNA Integrated Analysis Reveals Roles for miRNAs in Primary Breast Tumors , 2011, PloS one.

[49]  David Venet,et al.  Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome , 2011, PLoS Comput. Biol..

[50]  I. Ellis,et al.  Expert Commentary. , 2002, Histopathology.

[51]  Trey Ideker,et al.  Boosting Signal-to-Noise in Complex Biology: Prior Knowledge Is Power , 2011, Cell.

[52]  A. Harris,et al.  Large meta-analysis of multiple cancers reveals a common, compact and highly prognostic hypoxia metagene , 2010, British Journal of Cancer.

[53]  Israel Steinfeld,et al.  Correction: miRNA-mRNA Integrated Analysis Reveals Roles for miRNAs in Primary Breast Tumors , 2013, PLoS ONE.

[54]  Robert A. Weinberg,et al.  A Pleiotropically Acting MicroRNA, miR-31, Inhibits Breast Cancer Metastasis , 2009 .

[55]  Ariel S. Schwartz,et al.  An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man , 2010, Cell.

[56]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[57]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[58]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[59]  John D. Storey,et al.  Supervised normalization of microarrays , 2010, Bioinform..

[60]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[61]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[62]  Nuno A. Fonseca,et al.  Assemblathon 1: a competitive assessment of de novo short read assembly methods. , 2011, Genome research.

[63]  R. Norel,et al.  The self-assessment trap: can we all be better than average? , 2011, Molecular systems biology.

[64]  Z. Szallasi,et al.  A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers , 2006, Nature Genetics.