Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR

Recent developments in sequencing techniques have enabled rapid and high-throughput generation of sequence data, democratizing the ability to compile information on large amounts of genetic variations in individual laboratories. However, there is a growing gap between the generation of raw sequencing data and the extraction of meaningful biological information. Here, we describe a protocol to use the ANNOVAR (ANNOtate VARiation) software to facilitate fast and easy variant annotations, including gene-based, region-based and filter-based annotations on a variant call format (VCF) file generated from human genomes. We further describe a protocol for gene-based annotation of a newly sequenced nonhuman species. Finally, we describe how to use a user-friendly and easily accessible web server called wANNOVAR to prioritize candidate genes for a Mendelian disease. The variant annotation protocols take 5–30 min of computer time, depending on the size of the variant file, and 5–10 min of hands-on time. In summary, through the command-line tool and the web server, these protocols provide a convenient means to analyze genetic variants generated in humans and other species.

[1]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[2]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[3]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[4]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[5]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[6]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[7]  David Haussler,et al.  The UCSC Known Genes , 2006, Bioinform..

[8]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[9]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[10]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[11]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[14]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[15]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[16]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[17]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[18]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[19]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[20]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[21]  Ian N M Day,et al.  dbSNP in the detail and copy number complexities , 2010, Human mutation.

[22]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[23]  M. Weedon,et al.  Exome sequencing identifies a DYNC1H1 mutation in a large pedigree with dominant axonal Charcot-Marie-Tooth disease. , 2011, American journal of human genetics.

[24]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[25]  Jie Li,et al.  Regenerative phenotype in mice with a point mutation in transforming growth factor β type I receptor (TGFBR1) , 2011, Proceedings of the National Academy of Sciences.

[26]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[27]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[28]  S. Lok,et al.  Increased exonic de novo mutation rate in individuals with schizophrenia , 2011, Nature Genetics.

[29]  Cristina Y. González,et al.  VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing , 2012, Nucleic Acids Res..

[30]  Joaquín Dopazo,et al.  SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants , 2011, Nucleic Acids Res..

[31]  K. Shianna,et al.  Using ERDS to infer copy-number variants in high-coverage genomes. , 2012, American journal of human genetics.

[32]  Heng Li,et al.  Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly , 2012, Bioinform..

[33]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[34]  Lilia M. Iakoucheva,et al.  Whole-Genome Sequencing in Autism Identifies Hot Spots for De Novo Germline Mutation , 2012, Cell.

[35]  Kai Wang,et al.  Identifying disease mutations in genomic medicine settings: current challenges and how to accelerate progress , 2012, Genome Medicine.

[36]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[37]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[38]  Vladimir Makarov,et al.  AnnTools: a comprehensive and versatile annotation toolkit for genomic variants , 2012, Bioinform..

[39]  Kai Wang,et al.  wANNOVAR: annotating genetic variants for personal genomes via the web , 2012, Journal of Medical Genetics.

[40]  Mark Gerstein,et al.  VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment , 2012, Bioinform..

[41]  E. Boerwinkle,et al.  dbNSFP v2.0: A Database of Human Non‐synonymous SNVs and Their Functional Predictions and Annotations , 2013, Human mutation.

[42]  C. Tang,et al.  Whole-Exome Sequencing to Identify a Novel LMNA Gene Mutation Associated with Inherited Cardiac Conduction Disease , 2013, PloS one.

[43]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[44]  Aaron R. Quinlan,et al.  GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations , 2013, PLoS Comput. Biol..

[45]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[46]  Mark Yandell,et al.  VAAST 2.0: Improved Variant Classification and Disease-Gene Identification Using a Conservation-Controlled Amino Acid Substitution Matrix , 2013, Genetic epidemiology.

[47]  M. Pop,et al.  Sequence assembly demystified , 2013, Nature Reviews Genetics.

[48]  Daniel Nilsson,et al.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge , 2014, Genome Biology.

[49]  Jean-Baptiste Cazier,et al.  Choice of transcripts and software has a large effect on variant annotation , 2014, Genome Medicine.

[50]  August E. Woerner,et al.  Strong selective sweeps associated with ampliconic regions in great ape X chromosomes , 2014, 1402.5790.

[51]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[52]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[53]  Xun Xu,et al.  SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads , 2013, Bioinform..

[54]  Gustavo Glusman,et al.  A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data , 2014, Nature Biotechnology.

[55]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[56]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[57]  Jack R. Collins,et al.  AVIA v2.0: annotation, visualization and impact analysis of genomic variants and genes , 2015, Bioinform..

[58]  R. Gibbs,et al.  Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. , 2015, Human molecular genetics.