Transcriptional Profiles from Paired Normal Samples Offer Complementary Information on Cancer Patient Survival – Evidence from TCGA Pan-Cancer Data

Although normal tissue samples adjacent to tumors are sometimes collected from patients in cancer studies, they are often used as normal controls to identify genes differentially expressed between tumor and normal samples. However, it is in general more difficult to obtain and clearly define paired normal samples, and whether these samples should be treated as “normal” due to their close proximity to tumors. In this article, by analyzing the accrued data in The Cancer Genome Atlas (TCGA), we show the surprising results that the paired normal samples are in general more informative on patient survival than tumors. Different lines of evidence suggest that this is likely due to tumor micro-environment instead of tumor cell contamination or field cancerization effect. Pathway analyses suggest that tumor micro-environment may play an important role in cancer patient survival either by boosting the adjacent metabolism or the in situ immunization. Our results suggest the potential benefit of collecting and profiling matched normal tissues to gain more insights on disease etiology and patient progression.

[1]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Paula D. Bos,et al.  Metastasis: from dissemination to organ-specific colonization , 2009, Nature Reviews Cancer.

[4]  N. Hu,et al.  PLCE1 mRNA and Protein Expression and Survival of Patients with Esophageal Squamous Cell Carcinoma and Gastric Adenocarcinoma , 2014, Cancer Epidemiology, Biomarkers & Prevention.

[5]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[6]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[7]  Melissa A Troester,et al.  Gene expression in extratumoral microenvironment predicts clinical outcome in breast cancer patients , 2012, Breast Cancer Research.

[8]  C. Heaphy,et al.  Mammary field cancerization: molecular evidence and clinical importance , 2009, Breast Cancer Research and Treatment.

[9]  F. Baker Stability of Two Hierarchical Grouping Techniques Case I: Sensitivity to Data Errors , 1974 .

[10]  Jason R. Pirone,et al.  Activation of Host Wound Responses in Breast Cancer Microenvironment , 2009, Clinical Cancer Research.

[11]  D.,et al.  Regression Models and Life-Tables , 2022 .

[12]  J. Guillem,et al.  Prediction of colorectal cancer relapse and survival via tissue RNA levels of matrix metalloproteinase-9. , 1996, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[13]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[14]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[15]  A. Hart,et al.  Marker genes for circulating tumour cells predict survival in metastasized breast cancer patients , 2003, British Journal of Cancer.

[16]  Andrew H. Beck,et al.  Etiologic field effect: reappraisal of the field effect concept in cancer predisposition and progression , 2015, Modern Pathology.

[17]  Udaya B. Kogalur,et al.  Random Survival Forests for R , 2007 .

[18]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[19]  Robert E. Brown,et al.  Field effect in cancer-an update. , 2009, Annals of clinical and laboratory science.

[20]  Joe W. Gray,et al.  Genomic aberrations in normal tissue adjacent to HER2-amplified breast cancers: field cancerization or contaminating tumor cells? , 2012, Breast Cancer Research and Treatment.

[21]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[22]  L. Breiman OUT-OF-BAG ESTIMATION , 1996 .

[23]  D. Cox Regression Models and Life-Tables , 1972 .

[24]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[25]  H. Zou,et al.  A cocktail algorithm for solving the elastic net penalized Cox’s regression in high dimensions , 2013 .

[26]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[27]  Xi Chen,et al.  Random survival forests for high‐dimensional data , 2011, Stat. Anal. Data Min..

[28]  Hyun Cheol Chung,et al.  High KLF4 level in normal tissue predicts poor survival in colorectal cancer patients , 2014, World Journal of Surgical Oncology.

[29]  C. Sander,et al.  Mutual exclusivity analysis identifies oncogenic network modules. , 2012, Genome research.

[30]  Roded Sharan,et al.  Simultaneous Identification of Multiple Driver Pathways in Cancer , 2013, PLoS Comput. Biol..