Prioritization and functional assessment of noncoding variants associated with complex diseases

Unraveling functional noncoding variants associated with complex diseases is still a great challenge. We present a novel algorithm, Prioritization And Functional Assessment (PAFA), that prioritizes and assesses the functionality of genetic variants by introducing population differentiation measures and recalibrating training variants. Comprehensive evaluations demonstrate that PAFA exhibits much higher sensitivity and specificity in prioritizing noncoding risk variants than existing methods. PAFA achieves improved performance in distinguishing both common and rare recurrent variants from non-recurrent variants by integrating multiple annotations and metrics. An integrated platform was developed, providing comprehensive functional annotations for noncoding variants by integrating functional genomic data, which can be accessed at http://159.226.67.237:8080/pafa.

[1]  Melanie Bahlo,et al.  Genome-wide analyses identify common variants associated with macular telangiectasia type 2 , 2017, Nature Genetics.

[2]  J. Lupski,et al.  Non-coding genetic variants in human disease. , 2015, Human molecular genetics.

[3]  Cathy J Bradley,et al.  Race, socioeconomic status, and breast cancer treatment and survival. , 2002, Journal of the National Cancer Institute.

[4]  Jian Wang,et al.  The YH database: the first Asian diploid genome database , 2008, Nucleic Acids Res..

[5]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[6]  Zhaohui S. Qin,et al.  DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles , 2016, Genome Biology.

[7]  J. Chan,et al.  Type 2 diabetes in East Asians: similarities and differences with populations in Europe and the United States , 2013, Annals of the New York Academy of Sciences.

[8]  Vesna Todorovic Genetics: Predicting the impact of genomic variation , 2016, Nature Methods.

[9]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[10]  K. Becker,et al.  The Genetic Association Database , 2004, Nature Genetics.

[11]  J. Lupski,et al.  Human genome sequencing in health and disease. , 2012, Annual review of medicine.

[12]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[13]  S. Devesa,et al.  Prostate Cancer Incidence Rates in Africa , 2011, Prostate cancer.

[14]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[15]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[16]  Doheon Lee,et al.  Somatic deletions implicated in functional diversity of brain cells of individuals with schizophrenia and unaffected controls , 2014, Scientific Reports.

[17]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[18]  E. Zeggini,et al.  Functional annotation of non-coding sequence variants , 2014, Nature Methods.

[19]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[20]  Wenjie Chen,et al.  GRASP v2.0: an update on the Genome-Wide Repository of Associations between SNPs and phenotypes , 2014, Nucleic Acids Res..

[21]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[22]  Kevin Y. Yip,et al.  Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors , 2012, Genome Biology.

[23]  Feng Zhao,et al.  [Association between serum uric acid and brachial ankle pulse wave velocity in Beijing community residents]. , 2012, Zhonghua xin xue guan bing za zhi.

[24]  Tom R. Gaunt,et al.  FATHMM-XF: accurate prediction of pathogenic point mutations via extended features , 2017, Bioinform..

[25]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[26]  Chandra L. Theesfeld,et al.  Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder , 2016, Nature Neuroscience.

[27]  D. Hartl,et al.  Principles of population genetics , 1981 .

[28]  Noah A. Rosenberg,et al.  The Relationship Between FST and the Frequency of the Most Frequent Allele , 2013, Genetics.

[29]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[30]  Ernesto Picardi,et al.  UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs , 2009, Nucleic Acids Res..

[31]  R. Klein,et al.  Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. , 2014, The Lancet. Global health.

[32]  Pak Chung Sham,et al.  GWASdb v2: an update database for human genetic variants identified by genome-wide association studies , 2015, Nucleic Acids Res..

[33]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[34]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[35]  E. Génin,et al.  How important are rare variants in common disease? , 2014, Briefings in functional genomics.

[36]  Mark P Purdue,et al.  International Trends in the Incidence of Testicular Cancer, 1973-2002 , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[37]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[38]  Ariana Znaor,et al.  International variations and trends in testicular cancer incidence and mortality. , 2014, European urology.

[39]  E. Petroulakis,et al.  Benign HEXA Mutations, C739T(R247W) and C745T(R249W), Cause β-Hexosaminidase A Pseudodeficiency by Reducing the α-Subunit Protein Levels* , 1997, The Journal of Biological Chemistry.

[40]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[41]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[42]  Jeffrey E. Lee,et al.  Genome-wide association studies identify several new loci associated with pigmentation traits and skin cancer risk in European Americans. , 2013, Human molecular genetics.

[43]  Elaine R. Mardis,et al.  A decade’s perspective on DNA sequencing technology , 2011, Nature.

[44]  Elias Campo Guerri,et al.  International network of cancer genome projects , 2010 .

[45]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[46]  David C. Jones,et al.  Landscape of somatic mutations in 560 breast cancer whole genome sequences , 2016, Nature.

[47]  Gary D Bader,et al.  Functional impact of global rare copy number variation in autism spectrum disorders , 2010, Nature.

[48]  J. Kench,et al.  Whole genomes redefine the mutational landscape of pancreatic cancer , 2015, Nature.

[49]  P. Flicek,et al.  Applications of the 1000 Genomes Project resources , 2016, Briefings in functional genomics.

[50]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[51]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[52]  Brian T. Naughton,et al.  Web-Based, Participant-Driven Studies Yield Novel Genetic Associations for Common Traits , 2010, PLoS genetics.

[53]  Feng Wen,et al.  New loci and coding variants confer risk for age-related macular degeneration in East Asians , 2015, Nature Communications.

[54]  Arthur Liberzon,et al.  A description of the Molecular Signatures Database (MSigDB) Web site. , 2014, Methods in molecular biology.

[55]  Jian Xiao,et al.  PlasmoGF: an integrated system for comparative genomics and phylogenetic analysis of Plasmodium gene families , 2008, Bioinform..

[56]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[57]  Martin Renqiang Min,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[58]  M. Rieder,et al.  Exome sequencing of extreme phenotypes identifies DCTN4 as a modifier of chronic Pseudomonas aeruginosa infection in cystic fibrosis , 2012, Nature Genetics.

[59]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[60]  Frank B Hu,et al.  Genetics of type 2 diabetes in European populations , 2012, Journal of diabetes.

[61]  Dawn L Hershman,et al.  Racial disparities in cancer survival among randomized clinical trials patients of the Southwest Oncology Group. , 2009, Journal of the National Cancer Institute.

[62]  Kei-Hoi Cheung,et al.  A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data , 2015 .

[63]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[64]  David Haussler,et al.  The UCSC Genome Browser database: 2018 update , 2017, Nucleic Acids Res..

[65]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[66]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[67]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[68]  Amalio Telenti,et al.  Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites , 2017, Nature Genetics.

[69]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[70]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[71]  Elon Pras,et al.  Is E148Q a Benign Polymorphism or a Disease-causing Mutation? , 2009, The Journal of Rheumatology.