Integrative analysis of multiple cancer prognosis studies with gene expression measurements

Although in cancer research microarray gene profiling studies have been successful in identifying genetic variants predisposing to the development and progression of cancer, the identified markers from analysis of single datasets often suffer low reproducibility. Among multiple possible causes, the most important one is the small sample size hence the lack of power of single studies. Integrative analysis jointly considers multiple heterogeneous studies, has a significantly larger sample size, and can improve reproducibility. In this article, we focus on cancer prognosis studies, where the response variables are progression-free, overall, or other types of survival. A group minimax concave penalty (GMCP) penalized integrative analysis approach is proposed for analyzing multiple heterogeneous cancer prognosis studies with microarray gene expression measurements. An efficient group coordinate descent algorithm is developed. The GMCP can automatically accommodate the heterogeneity across multiple datasets, and the identified markers have consistent effects across multiple studies. Simulation studies show that the GMCP provides significantly improved selection results as compared with the existing meta-analysis approaches, intensity approaches, and group Lasso penalized integrative analysis. We apply the GMCP to four microarray studies and identify genes associated with the prognosis of breast cancer.

[1]  Andrew B. Nobel,et al.  Merging two gene-expression studies via cross-platform normalization , 2008, Bioinform..

[2]  V. Assmann,et al.  Biologic role of activated leukocyte cell adhesion molecule overexpression in breast cancer cell lines and clinical tumor tissue , 2011, Breast Cancer Research and Treatment.

[3]  Lee-Jen Wei,et al.  The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. , 1992, Statistics in medicine.

[4]  Christian Pilarsky,et al.  Meta-analysis of microarray data on pancreatic cancer defines a set of commonly dysregulated genes , 2005, Oncogene.

[5]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[7]  Winfried Stute,et al.  Consistent estimation under random censorship when covariables are present , 1993 .

[8]  C. Pilarsky,et al.  Identification and validation of commonly overexpressed genes in solid tumors by comparison of microarray data. , 2004, Neoplasia.

[9]  Jian Huang,et al.  Regularized gene selection in cancer microarray meta-analysis , 2009, BMC Bioinformatics.

[10]  Torsten Hothorn,et al.  Flexible boosting of accelerated failure time models , 2008, BMC Bioinformatics.

[11]  S. Chanock,et al.  Novel Breast Cancer Risk Alleles and Interaction with Ionizing Radiation among U.S. Radiologic Technologists , 2010, Radiation research.

[12]  Jian Huang,et al.  Regularized Estimation in the Accelerated Failure Time Model with High‐Dimensional Covariates , 2006, Biometrics.

[13]  Michael R. Kosorok,et al.  Detection of gene pathways with predictive power for breast cancer prognosis , 2010, BMC Bioinformatics.

[14]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[15]  Debashis Ghosh,et al.  Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data , 2004, BMC Genomics.

[16]  John R. Stevens,et al.  Meta-Analysis Combines Affymetrix Microarray Results Across Laboratories , 2005, Comparative and functional genomics.

[17]  V. Klimberg,et al.  Glutamine Affects Glutathione Recycling Enzymes in a DMBA-Induced Breast Cancer Model , 2008, Nutrition and cancer.

[18]  Lajos Pusztai,et al.  Gene expression profiling of breast cancer , 2009, Breast Cancer Research.

[19]  Zhiliang Ying,et al.  A Large Sample Study of Rank Estimation for Censored Regression Data , 1993 .

[20]  Susmita Datta,et al.  Predicting Patient Survival from Microarray Data by Accelerated Failure Time Modeling Using Partial Least Squares and LASSO , 2007, Biometrics.

[21]  I. James,et al.  Linear regression with censored data , 1979 .

[22]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[23]  Jian Huang,et al.  Integrative analysis and variable selection with multiple high-dimensional data sets. , 2011, Biostatistics.

[24]  Hyungwon Choi,et al.  A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray Experiments , 2007, BMC Bioinformatics.

[25]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[26]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[27]  M. Lubin,et al.  Selective Killing of Tumors Deficient in Methylthioadenosine Phosphorylase: A Novel Strategy , 2009, PloS one.

[28]  Steen Knudsen Cancer Diagnostics with DNA Microarrays , 2006 .

[29]  Jian Huang,et al.  Variable selection in the accelerated failure time model via the bridge method , 2010, Lifetime data analysis.

[30]  Bin Ma,et al.  Better score function for peptide identification with ETD MS/MS spectra , 2010, BMC Bioinformatics.

[31]  Steen Knudsen Cancer Diagnostics with DNA Microarrays: Knudsen/Cancer Diagnostics with DNA Microarrays , 2006 .

[32]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.