A framework for transcriptome-wide association studies in breast cancer in diverse study populations

Background The relationship between germline genetic variation and breast cancer survival is largely unknown, especially in understudied minority populations who often have poorer survival. Genome-wide association studies (GWAS) have interrogated breast cancer survival but often are underpowered due to subtype heterogeneity and clinical covariates and detect loci in non-coding regions that are difficult to interpret. Transcriptome-wide association studies (TWAS) show increased power in detecting functionally relevant loci by leveraging expression quantitative trait loci (eQTLs) from external reference panels in relevant tissues. However, ancestry- or race-specific reference panels may be needed to draw correct inference in ancestrally diverse cohorts. Such panels for breast cancer are lacking. Results We provide a framework for TWAS for breast cancer in diverse populations, using data from the Carolina Breast Cancer Study (CBCS), a population-based cohort that oversampled black women. We perform eQTL analysis for 406 breast cancer-related genes to train race-stratified predictive models of tumor expression from germline genotypes. Using these models, we impute expression in independent data from CBCS and TCGA, accounting for sampling variability in assessing performance. These models are not applicable across race, and their predictive performance varies across tumor subtype. Within CBCS ( N  = 3,828), at a false discovery-adjusted significance of 0.10 and stratifying for race, we identify associations in black women near AURKA , CAPN13 , PIK3CA , and SERPINB5 via TWAS that are underpowered in GWAS. Conclusions We show that carefully implemented and thoroughly validated TWAS is an efficient approach for understanding the genetics underpinning breast cancer outcomes in diverse populations.

[1]  Jeffery M. Meyer,et al.  A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer , 2018, Nature Genetics.

[2]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[3]  Gary D Bader,et al.  Association analysis identifies 65 new breast cancer risk loci , 2017, Nature.

[4]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[5]  Rosette Lidereau,et al.  PIK3CA mutation impact on survival in breast cancer patients and in ERα, PR and ERBB2-based subgroups , 2012, Breast Cancer Research.

[6]  O. Delaneau,et al.  A linear complexity phasing method for thousands of genomes , 2011, Nature Methods.

[7]  Melissa A. Troester,et al.  Race-associated biological differences among Luminal A breast tumors , 2015, Breast Cancer Research and Treatment.

[8]  J. Witte,et al.  Cis-eQTL-based trans-ethnic meta-analysis reveals novel genes associated with breast cancer risk , 2017, PLoS genetics.

[9]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[10]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[11]  R. Campanini,et al.  Breast cancer metastases are molecularly distinct from their primary tumors , 2008, Oncogene.

[12]  Christine B. Peterson,et al.  TreeQTL: hierarchical error control for eQTL findings , 2015, bioRxiv.

[13]  William Stafford Noble,et al.  Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors , 2012, Genome research.

[14]  Jaana M. Hartikainen,et al.  Common germline polymorphisms associated with breast cancer-specific survival , 2015, Breast Cancer Research.

[15]  Megan K. Mulligan,et al.  The Genetic Architecture of Murine Glutathione Transferases , 2016, PloS one.

[16]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[17]  G. Davey Smith,et al.  Mendelian Randomization in Case Only Studies: A Promising Approach to be Applied With Caution , 2018, The American journal of cardiology.

[18]  David Tritchler,et al.  On Inverting Permutation Tests , 1984 .

[19]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[20]  A. Janssens,et al.  How the Intended Use of Polygenic Risk Scores Guides the Design and Evaluation of Prediction Studies , 2019, Current Epidemiology Reports.

[21]  Peter Kraft,et al.  Identification of Novel Genetic Markers of Breast Cancer Survival , 2015, Journal of the National Cancer Institute.

[22]  A. Whittemore,et al.  Genome-wide association study of germline variants and breast cancer-specific mortality , 2019, British Journal of Cancer.

[23]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[24]  Sofia Khan,et al.  Meta-analysis of three genome-wide association studies identifies two loci that predict survival and treatment outcome in breast cancer , 2017, Oncotarget.

[25]  Han Xu,et al.  Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. , 2014, American journal of human genetics.

[26]  A. Olshan,et al.  Racial differences in physical activity among breast cancer survivors: Implications for breast cancer care , 2014, Cancer.

[27]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  Jaana M. Hartikainen,et al.  Large-scale genotyping identifies 41 new loci associated with breast cancer risk , 2013, Nature Genetics.

[30]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[31]  Stewart G. Martin,et al.  Calpain in Breast Cancer: Role in Disease Progression and Treatment Response , 2015, Pathobiology.

[32]  Joel S. Parker,et al.  Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer , 2016, Bioinform..

[33]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[34]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[35]  Kouros Owzar,et al.  Power and Sample Size Calculations for SNP Association Studies With Censored Time‐to‐Event Outcomes , 2012, Genetic epidemiology.

[36]  Masayuki Yoshida,et al.  PIK3CA mutation profiling in patients with breast cancer, using a highly sensitive detection system , 2018, Cancer science.

[37]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[38]  Jason P Fine,et al.  Practical recommendations for reporting Fine‐Gray model analyses for competing risk data , 2017, Statistics in medicine.

[39]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[40]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[41]  Robert Bjornson,et al.  Large-Scale trans-eQTLs Affect Hundreds of Transcripts and Mediate Patterns of Transcriptional Co-regulation. , 2017, American journal of human genetics.

[42]  William Stafford Noble,et al.  Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays , 2006, Nature Methods.

[43]  F. Couch,et al.  A Genome Wide Meta-Analysis Study for Identification of Common Variation Associated with Breast Cancer Prognosis , 2014, PloS one.

[44]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[45]  Ashley A. Jermusyk,et al.  Characterising cis-regulatory variation in the transcriptome of histologically normal and tumour-derived pancreatic tissues , 2017, Gut.

[46]  C. Bustamante,et al.  RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. , 2013, American journal of human genetics.

[47]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[48]  Jie Wang,et al.  Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium , 2012, Nucleic Acids Res..

[49]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[50]  C. Perou,et al.  Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. , 2006, JAMA.

[51]  G. Abecasis,et al.  A note on exact tests of Hardy-Weinberg equilibrium. , 2005, American journal of human genetics.

[52]  Olivier Delaneau,et al.  A complete tool set for molecular QTL discovery and analysis , 2016, Nature Communications.

[53]  Wolfgang Huber,et al.  Love MI, Huber W, Anders S.. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol 15: 550 , 2014 .

[54]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[55]  Zhao Zhang,et al.  PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types , 2017, Nucleic Acids Res..

[56]  Ross M. Fraser,et al.  A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness , 2014, PLoS genetics.

[57]  M. Bondy,et al.  What Can We Learn about Disease Etiology from Case-Case Analyses? Lessons from Breast Cancer , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[58]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.

[59]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.

[60]  Stephanie A. Bien,et al.  Genetic analyses of diverse populations improves discovery for complex traits , 2019, Nature.

[61]  C. Vachon,et al.  Common Genetic Variation and Breast Cancer Risk—Past, Present, and Future , 2018, Cancer Epidemiology, Biomarkers & Prevention.

[62]  Alan Wells,et al.  Calpains as potential anti-cancer targets , 2011, Expert opinion on therapeutic targets.

[63]  Jack A. Taylor,et al.  Common breast cancer risk loci predispose to distinct tumor subtypes , 2019, bioRxiv.

[64]  Variable prediction accuracy of polygenic scores within an ancestry group , 2020, eLife.

[65]  Alexander Gusev,et al.  A transcriptome-wide association study of high grade serous epithelial ovarian cancer identifies novel susceptibility genes and splice variants , 2019, Nature Genetics.

[66]  Yizhen Zhong,et al.  On Using Local Ancestry to Characterize the Genetic Architecture of Human Traits: Genetic Regulation of Gene Expression in Multiethnic or Admixed Populations. , 2019, American journal of human genetics.

[67]  J. Pritchard,et al.  Variable prediction accuracy of polygenic scores within an ancestry group , 2019, bioRxiv.

[68]  R. Millikan,et al.  Risk factors for breast cancer characterized by the estrogen receptor alpha A908G (K303R) mutation , 2007, Breast Cancer Research.

[69]  G. Turashvili,et al.  Tumor Heterogeneity in Breast Cancer , 2017, Front. Med..

[70]  Lin Hou,et al.  Identification of trans-eQTLs using mediation analysis with multiple mediators , 2019, BMC Bioinformatics.

[71]  Patrick Soon-Shiong,et al.  Molecular heterogeneity in breast cancer: State of the science and implications for patient care. , 2017, Seminars in cell & developmental biology.

[72]  Xiaoling Li,et al.  A novel computational complete deconvolution method using RNA-seq data , 2018 .

[73]  Kristen S Purrington,et al.  Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes , 2018, American Journal of Human Genetics.

[74]  C. Shriver,et al.  PSPHL and breast cancer in African American women: causative gene or population stratification? , 2014, BMC Genetics.

[75]  Shun-Fa Yang,et al.  Serpin peptidase inhibitor (SERPINB5) haplotypes are associated with susceptibility to hepatocellular carcinoma , 2016, Scientific Reports.

[76]  R. Millikan,et al.  The Carolina Breast Cancer Study: integrating population-based epidemiology and molecular biology , 1995, Breast Cancer Research and Treatment.

[77]  Jeffrey B. Endelman,et al.  Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP , 2011 .

[78]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[79]  Beth Newman,et al.  Comparative Analysis of Breast Cancer Risk Factors among African-American Women and White Women , 2005 .

[80]  Jun Yu Li,et al.  Polymorphisms in AURKA and AURKB are associated with the survival of triple-negative breast cancer patients treated with taxane-based adjuvant chemotherapy , 2018, Cancer management and research.

[81]  Nuala A Sheehan,et al.  Adjustment for index event bias in genome-wide association studies of subsequent events , 2018, Nature Communications.

[82]  David Levine,et al.  GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies , 2012, Bioinform..

[83]  C. Begg,et al.  Detecting and exploiting etiologic heterogeneity in epidemiologic studies. , 2012, American journal of epidemiology.

[84]  P. Visscher,et al.  Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index , 2015, Nature Genetics.

[85]  G. Rätsch,et al.  Assessing the Gene Regulatory Landscape in 1,188 Human Tumors , 2017, bioRxiv.

[86]  Zhiyuan Hu,et al.  Racial Differences in PAM50 Subtypes in the Carolina Breast Cancer Study , 2018, Journal of the National Cancer Institute.

[87]  Jianxin Shi,et al.  Developing and evaluating polygenic risk prediction models for stratified disease prevention , 2016, Nature Reviews Genetics.

[88]  Patrick Neven,et al.  Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer , 2015 .

[89]  Siwei Zhang,et al.  The calpain system is associated with survival of breast cancer patients with large but operable inflammatory and non-inflammatory tumours treated with neoadjuvant chemotherapy , 2016, Oncotarget.

[90]  A. Kreger National Death Index , 1979, Definitions.

[91]  J. Stamatoyannopoulos,et al.  Discovery of functional noncoding elements by digital analysis of chromatin structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[92]  Chun Jimmie Ye,et al.  On the cross-population generalizability of gene expression prediction models , 2019, bioRxiv.

[93]  J. Michael Cherry,et al.  The Encyclopedia of DNA elements (ENCODE): data portal update , 2017, Nucleic Acids Res..

[94]  Hae Kyung Im,et al.  Genetic architecture of gene expression traits across diverse populations , 2018, bioRxiv.

[95]  Jaana M. Hartikainen,et al.  Body mass index and breast cancer survival: a Mendelian randomization analysis , 2017, International journal of epidemiology.

[96]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[97]  Dennis J. Hazelett,et al.  The OncoArray Consortium: A Network for Understanding the Genetic Architecture of Common Cancers , 2016, Cancer Epidemiology, Biomarkers & Prevention.

[98]  V. Sergienko,et al.  Current rates and mechanisms of subsea permafrost degradation in the East Siberian Arctic Shelf , 2017, Nature Communications.

[99]  Yue Zhao,et al.  Cell subpopulation deconvolution reveals breast cancer heterogeneity based on DNA methylation signature , 2016, Briefings Bioinform..

[100]  M. van Iterson,et al.  Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution , 2016, Genome Biology.

[101]  Benjamin J. Strober,et al.  A method to predict the impact of regulatory variants from DNA sequence , 2015, Nature Genetics.

[102]  G. Davey Smith,et al.  Genetic epidemiology and Mendelian randomization for informing disease therapeutics: Conceptual and methodological challenges , 2017, bioRxiv.

[103]  Yang I Li,et al.  An Expanded View of Complex Traits: From Polygenic to Omnigenic , 2017, Cell.