Identification of Breast Cancer Prognosis Markers Using Integrative Sparse Boosting

OBJECTIVES In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution. METHODS We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects. RESULTS Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance. CONCLUSIONS Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Yang Li,et al.  Semiparametric prognosis models in genomic studies , 2010, Briefings Bioinform..

[3]  P. Ridker,et al.  A large-scale candidate gene association study of age at menarche and age at natural menopause , 2010, Human Genetics.

[4]  Steen Knudsen Cancer Diagnostics with DNA Microarrays: Knudsen/Cancer Diagnostics with DNA Microarrays , 2006 .

[5]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Lee-Jen Wei,et al.  The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. , 1992, Statistics in medicine.

[7]  Torsten Hothorn,et al.  Flexible boosting of accelerated failure time models , 2008, BMC Bioinformatics.

[8]  Jian Huang,et al.  Integrative analysis and variable selection with multiple high-dimensional data sets. , 2011, Biostatistics.

[9]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[10]  Paola Sebastiani,et al.  Early dysregulation of cell adhesion and extracellular matrix pathways in breast cancer progression. , 2009, The American journal of pathology.

[11]  Jiang Shou,et al.  Development of resistance to targeted therapies transforms the clinically associated molecular profile subtype of breast tumor xenografts. , 2008, Cancer research.

[12]  I. Poola,et al.  Identification of MMP-1 as a putative breast cancer predictive marker by global gene expression analysis , 2005, Nature Medicine.

[13]  Jian Huang,et al.  Identification of cancer genomic markers via integrative sparse boosting. , 2012, Biostatistics.

[14]  Shalabh Statistical Learning from a Regression Perspective , 2009 .

[15]  Harry Bartelink,et al.  Gene expression profiling and histopathological characterization of triple-negative/basal-like breast carcinomas , 2007, Breast Cancer Research.

[16]  G. Turashvili,et al.  Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis , 2007, BMC Cancer.

[17]  Steen Knudsen Cancer Diagnostics with DNA Microarrays , 2006 .

[18]  J. Maindonald Statistical Learning from a Regression Perspective , 2008 .

[19]  C. Wang,et al.  Statistical Applications in Genetics and Molecular Biology Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data , 2011 .

[20]  Lajos Pusztai,et al.  Gene expression profiling of breast cancer , 2009, Breast Cancer Research.

[21]  Matt van de Rijn,et al.  Gene expression profiling of breast cancer. , 2008, Annual review of pathology.

[22]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[23]  David E. Booth Cancer Diagnostics With DNA Microarrays , 2007, Technometrics.

[24]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[25]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  E. Diamandis,et al.  Proteomics Analysis of Conditioned Media from Three Breast Cancer Cell Lines , 2007, Molecular & Cellular Proteomics.

[27]  A. Chinnaiyan,et al.  Bioinformatics Strategies for Translating Genome‐Wide Expression Analyses into Clinically Useful Cancer Markers , 2004, Annals of the New York Academy of Sciences.

[28]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[29]  Winfried Stute,et al.  Consistent estimation under random censorship when covariables are present , 1993 .

[30]  Nicholas J. Wang,et al.  Characterization of a naturally occurring breast cancer subset enriched in epithelial-to-mesenchymal transition and stem cell characteristics. , 2009, Cancer research.

[31]  Jian Huang,et al.  Integrative analysis of multiple cancer prognosis studies with gene expression measurements , 2011, Statistics in medicine.

[32]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  W. Willett,et al.  Eighteen insulin-like growth factor pathway genes, circulating levels of IGF-I and its binding protein, and risk of prostate and breast cancer. , 2010, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[34]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[35]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[36]  Susmita Datta,et al.  Predicting Patient Survival from Microarray Data by Accelerated Failure Time Modeling Using Partial Least Squares and LASSO , 2007, Biometrics.