Using Prior Information from the Medical Literature in GWAS of Oral Cancer Identifies Novel Susceptibility Variant on Chromosome 4 - the AdAPT Method

Background Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [ptrend] = 2.5×10−3). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76–0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config).

Angus Roberts | Jon Wakefield | Hamish Cunningham | Niraj Aswani | Wolfgang Ahrens | Mark A. Greenwood | Antonio Agudo | Mattias Johansson | Peter Thomson | Ariana Znaor | Jolanta Lissowska | Paolo Boffetta | Lorenzo Richiardi | Xavier Castellsagué | Pagona Lagiou | Stefania Boccia | Vladimir Janout | Paul Brennan | Rolando Herrero | J. Wakefield | W. Ahrens | S. Franceschi | L. Vatten | C. Healy | M. Lathrop | D. Zélénika | H. Cunningham | P. Brennan | J. Lissowska | X. Castellsagué | Yaoyong Li | M. Greenwood | P. Boffetta | L. Forétova | A. Roberts | R. Talamini | P. Galan | A. Znaor | E. Fabianova | R. Herrero | M. Johansson | J. Eluf-Neto | D. Zaridze | V. Janout | V. Bencko | Neonilia szeszenia-Dabrowska | I. Holcatova | G. Byrnes | S. Boccia | N. Aswani | N. Thakker | P. Lagiou | L. Richiardi | S. Benhamou | J. McKay | A. Agudo | L. Barzan | C. Canova | D. Conway | K. Kjaerheim | T. Macfarlane | Dan Chen | S. Koifman | M. Curado | A. Menezes | V. Wünsch-Filho | Graham Byrnes | Maria Paula Curado | Mark Lathrop | Simone Benhamou | Silvia Franceschi | Diana Zelenika | M. Delahaye-Sourdeix | P. Thomson | Lenka Foretova | Renato Talamini | David Zaridze | Vladimir Bencko | Eleonóra Fabiánová | Pilar Galan | Yaoyong Li | Lars Vatten | Dan Chen | Kristina Kjaerheim | Manon Delahaye-Sourdeix | Ivana Holcátová | Tatiana V. Macfarlane | Luigi Barzan | Cristina Canova | Nalin S. Thakker | David I. Conway | Claire M. Healy | Neonilia Szeszenia-Dabrowska | Ioan Nicolae Mates | Sergio Koifman | Ana Menezes | Victor Wünsch-Filho | Jose Eluf-Neto | Leticia Fernandez Garrote | James D. Mckay | I. Mateș | L. Fernández Garrote | N. szeszenia-Dabrowska | L. Foretova | J. Mckay | Niraj Aswani | E. Fabiánová | P. Brennan | Manon Delahaye-Sourdeix | V. Wünsch-Filho

[1]  Stephen Chanock,et al.  Population Substructure and Control Selection in Genome-Wide Association Studies , 2008, PloS one.

[2]  Paul Brennan,et al.  Replication of lung cancer susceptibility loci at chromosomes 15q25, 5p15, and 6p21: a pooled analysis from the International Lung Cancer Consortium. , 2010, Journal of the National Cancer Institute.

[3]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[4]  S. Franceschi,et al.  Human papillomavirus and oral cancer: the International Agency for Research on Cancer multicenter study. , 2003, Journal of the National Cancer Institute.

[5]  D. Thomas,et al.  Methods for investigating gene-environment interactions in candidate pathway and genome-wide association studies. , 2010, Annual review of public health.

[6]  Jon Wakefield,et al.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies. , 2007, American journal of human genetics.

[7]  W. Ahrens,et al.  Alcohol-related cancers and genetic susceptibility in Europe: the ARCAGE project: study samples and data collection , 2009, European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation.

[8]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[9]  Paolo Vineis,et al.  A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25 , 2008, Nature.

[10]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[11]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[12]  A. Olshan,et al.  A sex-specific association between a 15q25 variant and upper aerodigestive tract cancers. , 2011, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[13]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[14]  M. Daly,et al.  Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions , 2009, PLoS genetics.

[15]  S. Zhong,et al.  A Genome-Wide Association Study of Upper Aerodigestive Tract Cancers Conducted within the INHANCE Consortium , 2011, PLoS genetics.

[16]  P. Brennan,et al.  Multiple ADH genes are associated with upper aerodigestive cancers , 2008, Nature Genetics.

[17]  Jon Wakefield,et al.  Reporting and interpretation in genome-wide association studies. , 2008, International journal of epidemiology.

[18]  Simon Heath,et al.  Lung cancer susceptibility locus at 5p15.33 , 2008, Nature Genetics.

[19]  G. Mills,et al.  Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1 , 2008, Nature Genetics.

[20]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[21]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[22]  A. Olshan,et al.  Pooled analysis of alcohol dehydrogenase genotypes and head and neck cancer: a HuGE review. , 2004, American journal of epidemiology.

[23]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[24]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[25]  Valentin Tablan,et al.  Information Extraction and Semantic Annotation for Multi-Paradigm Information Management , 2011, Current Challenges in Patent Information Retrieval.

[26]  John Tait,et al.  Current Challenges in Patent Information Retrieval , 2011, The Information Retrieval Series.

[27]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[28]  C. la Vecchia,et al.  Family history and the risk of oral and pharyngeal cancer , 2007, International journal of cancer.

[29]  John P A Ioannidis,et al.  A compendium of genome-wide associations for cancer: critical synopsis and reappraisal. , 2010, Journal of the National Cancer Institute.

[30]  D. Winn,et al.  Enhancing epidemiologic research on head and neck cancer: INHANCE - The international head and neck cancer epidemiology consortium. , 2009, Oral Oncology.

[31]  P. Brennan,et al.  Occupational Exposure to Vinyl Chloride, Acrylonitrile and Styrene and Lung Cancer Risk (Europe) , 2004, Cancer Causes & Control.

[32]  S. Heath,et al.  Association between a 15q25 gene variant, smoking quantity and tobacco-related cancers among 17 000 individuals. , 2010, International journal of epidemiology.