Modeling intra‐tumor protein expression heterogeneity in tissue microarray experiments

Tissue microarrays (TMAs) measure tumor-specific protein expression via high-density immunohistochemical staining assays. They provide a proteomic platform for validating cancer biomarkers emerging from large-scale DNA microarray studies. Repeated observations within each tumor result in substantial biological and experimental variability. This variability is usually ignored when associating the TMA expression data with patient survival outcome. It generates biased estimates of hazard ratio in proportional hazards models. We propose a Latent Expression Index (LEI) as a surrogate protein expression estimate in a two-stage analysis. Several estimators of LEI are compared: an empirical Bayes, a full Bayes, and a varying replicate number estimator. In addition, we jointly model survival and TMA expression data via a shared random effects model. Bayesian estimation is carried out using a Markov chain Monte Carlo method. Simulation studies were conducted to compare the two-stage methods and the joint analysis in estimating the Cox regression coefficient. We show that the two-stage methods reduce bias relative to the naive approach, but still lead to under-estimated hazard ratios. The joint model consistently outperforms the two-stage methods in terms of both bias and coverage property in various simulation scenarios. In case studies using prostate cancer TMA data sets, the two-stage methods yield a good approximation in one data set whereas an insufficient one in the other. A general advice is to use the joint model inference whenever results differ between the two-stage methods and the joint analysis.

[1]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[2]  Anastasios A. Tsiatis,et al.  Joint Modeling of Longitudinal and Time-to-Event Data : An Overview , 2004 .

[3]  Debashis Ghosh,et al.  Decreased α-Methylacyl CoA Racemase Expression in Localized Prostate Cancer is Associated with an Increased Rate of Biochemical Recurrence and Cancer-Specific Death , 2005, Cancer Epidemiology Biomarkers & Prevention.

[4]  S. Horvath,et al.  Global histone modification patterns predict risk of prostate cancer recurrence , 2005, Nature.

[5]  C. McCulloch Maximum Likelihood Variance Components Estimation for Binary Data , 1994 .

[6]  Joseph G Ibrahim,et al.  Bayesian Error‐in‐Variable Survival Model for the Analysis of GeneChip Arrays , 2005, Biometrics.

[7]  Bradley P Carlin,et al.  Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages , 2004 .

[8]  J. Ibrahim,et al.  A Bayesian semiparametric joint hierarchical model for longitudinal and survival data. , 2003, Biometrics.

[9]  J. Kononen,et al.  Tissue microarrays for high-throughput molecular profiling of tumor specimens , 1998, Nature Medicine.

[10]  M. Wulfsohn,et al.  A joint model for survival and longitudinal data measured with error. , 1997, Biometrics.

[11]  Gösta Winberg,et al.  NotI passporting to identify species composition of complex microbial systems. , 2003, Nucleic acids research.

[12]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[13]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Yan Wang,et al.  Jointly Modeling Longitudinal and Event Time Data With Application to Acquired Immunodeficiency Syndrome , 2001 .

[15]  M. Wulfsohn,et al.  Modeling the Relationship of Survival to Longitudinal Data Measured with Error. Applications to Survival and CD4 Counts in Patients with AIDS , 1995 .

[16]  David L Rimm,et al.  Automated Quantitative Analysis of Tissue Microarrays Reveals an Association between High Bcl-2 Expression and Improved Outcome in Melanoma , 2004, Cancer Research.

[17]  D. Rimm,et al.  Automated subcellular localization and quantification of protein expression in tissue microarrays , 2002, Nature Medicine.

[18]  D. Bates,et al.  Nonlinear mixed effects models for repeated measures data. , 1990, Biometrics.

[19]  R Henderson,et al.  Joint modelling of longitudinal measurements and event time data. , 2000, Biostatistics.

[20]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[21]  P. Febbo,et al.  Defining aggressive prostate cancer using a 12-gene model. , 2006, Neoplasia.

[22]  R. Prentice Covariate measurement errors and parameter estimation in a failure time regression model , 1982 .

[23]  Xueli Liu,et al.  Statistical Methods for Analyzing Tissue Microarray Data , 2004, Journal of biopharmaceutical statistics.

[24]  Dean Billheimer,et al.  Analyzing Patterns of Staining in Immunohistochemical Studies: Application to a Study of Prostate Cancer Recurrence , 2005, Cancer Epidemiology Biomarkers & Prevention.

[25]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[26]  S. Zeger,et al.  Joint analysis of longitudinal data comprising repeated measures and times to events , 2001 .

[27]  D. Thomas,et al.  Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. , 1996, Statistics in medicine.

[28]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[29]  J. Gentle,et al.  Special Section: Teaching Computational Statistics , 2004 .