Prioritizing Disease‐Linked Variants, Genes, and Pathways with an Interactive Whole‐Genome Analysis Pipeline

Whole‐genome sequencing (WGS) studies are uncovering disease‐associated variants in both rare and nonrare diseases. Utilizing the next‐generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline—gNOME—to prioritize phenotype‐associated variants while minimizing false‐positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype‐associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole‐exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss‐of‐function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (http://gnome.tchlab.org).

[1]  Nada Jabado,et al.  Unexpected allelic heterogeneity and spectrum of mutations in Fowler syndrome revealed by next‐generation exome sequencing , 2010, Human mutation.

[2]  Rong Chen,et al.  Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation. , 2012, American journal of human genetics.

[3]  Kathryn Roeder,et al.  Rare Complete Knockouts in Humans: Population Distribution and Significant Role in Autism Spectrum Disorders , 2013, Neuron.

[4]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[5]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[6]  I. Kohane,et al.  Taxonomizing, sizing, and overcoming the incidentalome , 2012, Genetics in Medicine.

[7]  Yun Li,et al.  Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. , 2010, American journal of human genetics.

[8]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[9]  E. Simpson,et al.  Frequent somatic mutations of GNAQ in uveal melanoma and blue nevi , 2008, Nature.

[10]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[11]  Caleb Davis,et al.  Exome Sequencing of Ion Channel Genes Reveals Complex Profiles Confounding Personal Risk Assessment in Epilepsy , 2011, Cell.

[12]  M. Rieder,et al.  Erratum: Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants (Nature (2013) 493 (216-220) DOI: 10.1038/nature116) , 2013 .

[13]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[14]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[15]  Elizabeth T. Cirulli,et al.  The Characterization of Twenty Sequenced Human Genomes , 2010, PLoS genetics.

[16]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[17]  R. Drmanac The advent of personal genome sequencing , 2011, Genetics in Medicine.

[18]  Joyce A. Mitchell,et al.  Gene Indexing: Characterization and Analysis of NLM's GeneRIFs , 2003, AMIA.

[19]  M. Bamshad,et al.  Genomics really gets personal: How exome and whole genome sequencing challenge the ethical framework of human genetics research , 2011, American journal of medical genetics. Part A.

[20]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[21]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[22]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[23]  Laura Inés Furlong,et al.  DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene-disease networks , 2010, Bioinform..

[24]  Alexander A. Morgan,et al.  Clinical assessment incorporating a personal genome , 2010, The Lancet.

[25]  Sek Won Kong,et al.  gSearch: a fast and flexible general search tool for whole-genome sequencing , 2012, Bioinform..

[26]  Kai Wang,et al.  wANNOVAR: annotating genetic variants for personal genomes via the web , 2012, Journal of Medical Genetics.

[27]  M. Rivas,et al.  Nature Genetics Advance Online Publication High-throughput, Pooled Sequencing Identifies Mutations in Nubpl and Foxred1 in Human Complex I Deficiency , 2022 .

[28]  P. Stankiewicz,et al.  Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. , 2010, The New England journal of medicine.

[29]  D. Goldstein,et al.  Sequencing studies in human genetics: design and interpretation , 2013, Nature Reviews Genetics.

[30]  Mark Gerstein,et al.  VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment , 2012, Bioinform..

[31]  Zlatko Trajanoski,et al.  SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data , 2012, PloS one.

[32]  Emily H Turner,et al.  Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome , 2010, Nature Genetics.

[33]  S. Gabriel,et al.  Advances in understanding cancer genomes through second-generation sequencing , 2010, Nature Reviews Genetics.

[34]  Hugo Y. K. Lam,et al.  Detecting and annotating genetic variations using the HugeSeq pipeline , 2012, Nature Biotechnology.

[35]  I. Tikhonova,et al.  Genetic diagnosis by whole exome capture and massively parallel DNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[36]  Michael R. Speicher,et al.  A survey of tools for variant analysis of next-generation genome sequencing data , 2013, Briefings Bioinform..

[37]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[38]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[39]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.

[40]  M. King,et al.  Genetic Heterogeneity in Human Disease , 2010, Cell.

[41]  Christian Gilissen,et al.  De novo mutations of SETBP1 cause Schinzel-Giedion syndrome , 2010, Nature Genetics.

[42]  Justin C. Fay,et al.  Identification of deleterious mutations within three human genomes. , 2009, Genome research.

[43]  Huanming Yang,et al.  Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder , 2011, Nature Genetics.

[44]  G. Barsh,et al.  Frequent somatic mutations of GNAQ in uveal melanoma and blue naevi , 2010 .

[45]  M. Schatz,et al.  Searching for SNPs with cloud computing , 2009, Genome Biology.

[46]  Dan M Roden,et al.  A rare variant in MYH6 is associated with high risk of sick sinus syndrome , 2011, Nature Genetics.

[47]  E. Zeggini,et al.  An Evaluation of Statistical Approaches to Rare Variant Analysis in Genetic Association Studies , 2009, Genetic epidemiology.

[48]  P. Shannon,et al.  Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing , 2010, Science.

[49]  Shashikant Kulkarni,et al.  Assuring the quality of next-generation sequencing in clinical laboratory practice , 2012, Nature Biotechnology.

[50]  David Haussler,et al.  The UCSC Known Genes , 2006, Bioinform..

[51]  Karen Eilbeck,et al.  A standard variation file format for human genome sequences , 2010, Genome Biology.

[52]  Wei Pan,et al.  A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants , 2010, Human Heredity.

[53]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[54]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[55]  M. Daly,et al.  Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology , 2011, PLoS genetics.

[56]  M. G. Reese,et al.  A probabilistic disease-gene finder for personal genomes. , 2011, Genome research.

[57]  A. Bowcock,et al.  Frequent Mutation of BAP1 in Metastasizing Uveal Melanomas , 2010, Science.

[58]  S. Beatty,et al.  Differential expression of fourteen proteins between uveal melanoma from patients who subsequently developed distant metastases versus those who did Not. , 2012, Investigative ophthalmology & visual science.

[59]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[60]  Elizabeth T. Cirulli,et al.  SVA: software for annotating and visualizing sequenced human genomes , 2011, Bioinform..

[61]  Joshua M. Korn,et al.  Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease , 2011, Nature Genetics.

[62]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[63]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[64]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[65]  Darlene Riethmaier,et al.  Towards a Universal Clinical Genomics Database: The 2012 International Standards for Cytogenomic Arrays Consortium Meeting , 2013, Human mutation.

[66]  Bo Peng,et al.  Integrated annotation and analysis of genetic variants from next-generation sequencing studies with variant tools , 2012, Bioinform..

[67]  Shamil R Sunyaev,et al.  Pooled association tests for rare variants in exon-resequencing studies. , 2010, American journal of human genetics.

[68]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[69]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[70]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[71]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[72]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[73]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[74]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[75]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[76]  D. Goldstein,et al.  Uncovering the roles of rare variants in common disease through whole-genome sequencing , 2010, Nature Reviews Genetics.

[77]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[78]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[79]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[81]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[82]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.