Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process

Recent technological advances have created challenges for geneticists and a need to adapt to a wide range of new bioinformatics tools and an expanding wealth of publicly available data (e.g., mutation databases, and software). This wide range of methods and a diversity of file formats used in sequence analysis is a significant issue, with a considerable amount of time spent before anyone can even attempt to analyse the genetic basis of human disorders. Another point to consider that is although many possess “just enough” knowledge to analyse their data, they do not make full use of the tools and databases that are available and also do not fully understand how their data was created. The primary aim of this review is to document some of the key approaches and provide an analysis schema to make the analysis process more efficient and reliable in the context of discovering highly penetrant causal mutations/genes. This review will also compare the methods used to identify highly penetrant variants when data is obtained from consanguineous individuals as opposed to nonconsanguineous; and when Mendelian disorders are analysed as opposed to common-complex disorders.

[1]  Mustaq Ahmed,et al.  Quantification of homozygosity in consanguineous individuals with autosomal recessive disease. , 2006, American journal of human genetics.

[2]  Elizabeth T. Cirulli,et al.  Whole-Genome Sequencing of a Single Proband Together with Linkage Analysis Identifies a Mendelian Disease Gene , 2010, PLoS genetics.

[3]  Eugene Y Chan,et al.  Next-generation sequencing methods: impact of sequencing accuracy on SNP discovery. , 2009, Methods in molecular biology.

[4]  Markus Perola,et al.  Genome-wide association study identifies multiple loci influencing human serum metabolite levels , 2012, Nature Genetics.

[5]  Adam Kiezun,et al.  Computational and statistical approaches to analyzing variants identified by exome sequencing , 2011, Genome Biology.

[6]  Joakim Lundeberg,et al.  Generations of sequencing technologies. , 2009, Genomics.

[7]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[8]  Graham R Taylor,et al.  Interactive visual analysis of SNP data for rapid autozygosity mapping in consanguineous families , 2006, Human mutation.

[9]  Predrag Radivojac,et al.  Automated inference of molecular mechanisms of disease from amino acid substitutions , 2009, Bioinform..

[10]  Modesto Orozco,et al.  PMUT: a web-based tool for the annotation of pathological mutations on proteins , 2005, Bioinform..

[11]  David Reich,et al.  Phasing of many thousands of genotyped samples. , 2012, American journal of human genetics.

[12]  Anushya Muruganujan,et al.  PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification , 2003, Nucleic Acids Res..

[13]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[14]  Anushya Muruganujan,et al.  PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium , 2009, Nucleic Acids Res..

[15]  J. Sadler,et al.  Pathogenesis of thrombotic microangiopathies. , 2008, Annual review of pathology.

[16]  Tom R. Gaunt,et al.  Nonsense Mutation in Coiled-Coil Domain Containing 151 Gene (CCDC151) Causes Primary Ciliary Dyskinesia , 2014, Human mutation.

[17]  Michael Brudno,et al.  Identification of deleterious synonymous variants in human genomes , 2013, Bioinform..

[18]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[19]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[20]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[21]  Christian Gilissen,et al.  Disease gene identification strategies for exome sequencing , 2012, European Journal of Human Genetics.

[22]  Barry Merriman,et al.  Local alignment of two-base encoded DNA sequence , 2009, BMC Bioinformatics.

[23]  Si Quang Le,et al.  SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. , 2011, Genome research.

[24]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[25]  J. Weissenbach,et al.  A fourth locus for autosomal dominant hypercholesterolemia maps at 16q22.1 , 2010, European Journal of Human Genetics.

[26]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[27]  Sara B. Linker,et al.  Comparison of Three Targeted Enrichment Strategies on the SOLiD Sequencing Platform , 2011, PloS one.

[28]  R. Altman,et al.  A new disease-specific machine learning approach for the prediction of cancer-causing missense variants. , 2011, Genomics.

[29]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[30]  Jean-Baptiste Cazier,et al.  Choice of transcripts and software has a large effect on variant annotation , 2014, Genome Medicine.

[31]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[32]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[33]  R. Norio The Finnish disease heritage III: the individual diseases , 2003, Human Genetics.

[34]  Mi Zhou,et al.  nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms , 2005, Nucleic Acids Res..

[35]  Joaquín Dopazo,et al.  PupaSuite: finding functional single nucleotide polymorphisms for large-scale genotyping purposes , 2006, Nucleic Acids Res..

[36]  Joel Gelernter,et al.  Variant Callers for Next-Generation Sequencing Data: A Comparison Study , 2013, PloS one.

[37]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[38]  Aleksandar Milosavljevic,et al.  An integrative variant analysis suite for whole exome next-generation sequencing data , 2012, BMC Bioinformatics.

[39]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[40]  C. Woods,et al.  A new method for autozygosity mapping using single nucleotide polymorphisms (SNPs) and ExcludeAR , 2004, Journal of Medical Genetics.

[41]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[42]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[43]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[44]  J. Kitzman,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Whole exome capture in solution with 3Gbp of data , 2010 .

[45]  M. Blumenthal Genetic, epigenetic, and environmental factors in asthma and allergy. , 2012, Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology.

[46]  R. Norio Finnish Disease Heritage I: characteristics, causes, background. , 2003, Human genetics.

[47]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[48]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[49]  C. Wood,et al.  Environmental and modifiable risk factors in renal cell carcinoma. , 2012, Urologic oncology.

[50]  Mario Cortina-Borja,et al.  Epistasis in sporadic Alzheimer's disease , 2009, Neurobiology of Aging.

[51]  C. Nemeroff,et al.  Etiology of depression: genetic and environmental factors. , 2012, The Psychiatric clinics of North America.

[52]  Adam Kiezun,et al.  Exome sequencing and the genetic basis of complex traits , 2012, Nature Genetics.

[53]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[54]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[55]  François Stricher,et al.  SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs , 2004, Nucleic Acids Res..

[56]  Peng Yue,et al.  SNPs3D: Candidate gene and SNP selection for association studies , 2006, BMC Bioinformatics.

[57]  Hans-Jürgen Bandelt,et al.  Phantom mutation hotspots in human mitochondrial DNA , 2005, Electrophoresis.

[58]  Jeroen F. J. Laros,et al.  LOVD v.2.0: the next generation in gene variant databases , 2011, Human mutation.

[59]  P. Stenson,et al.  The Human Gene Mutation Database (HGMD) and Its Exploitation in the Fields of Personalized Genomics and Molecular Evolution , 2012, Current protocols in bioinformatics.

[60]  Yudi Pawitan,et al.  Revisiting Mendelian disorders through exome sequencing , 2011, Human Genetics.

[61]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[62]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[63]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[64]  Gabor T. Marth,et al.  Pyrobayes: an improved base caller for SNP discovery in pyrosequences , 2008, Nature Methods.

[65]  D. Bick,et al.  Whole Exome and Whole Genome Sequencing – Community Plan Medical Policy , 2018 .

[66]  I. Tikhonova,et al.  Genetic diagnosis by whole exome capture and massively parallel DNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[67]  Jay Shendure,et al.  Single-nucleotide evolutionary constraint scores highlight disease-causing mutations , 2010, Nature Methods.

[68]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[69]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[70]  C. Béroud,et al.  Human Splicing Finder: an online bioinformatics tool to predict splicing signals , 2009, Nucleic acids research.

[71]  Jun Wang,et al.  Genetic diversity, molecular phylogeny and selection evidence of the silkworm mitochondria implicated by complete resequencing of 41 genomes , 2010, BMC Evolutionary Biology.

[72]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[73]  Laura Bonetta,et al.  Whole-Genome Sequencing Breaks the Cost Barrier , 2010, Cell.

[74]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[75]  J. Marchini,et al.  Genotype Imputation with Thousands of Genomes , 2011, G3: Genes | Genomes | Genetics.

[76]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[77]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[78]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[79]  Ian N M Day,et al.  dbSNP in the detail and copy number complexities , 2010, Human mutation.

[80]  James C. Mullikin,et al.  Exome sequencing: the sweet spot before whole genomes , 2010, Human molecular genetics.

[81]  Shuangcheng Li,et al.  Identification of Genome-Wide Variations among Three Elite Restorer Lines for Hybrid-Rice , 2012, PloS one.

[82]  C. Kimchi-Sarfaty,et al.  Understanding the contribution of synonymous mutations to human disease , 2011, Nature Reviews Genetics.

[83]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[84]  Ashraful Hoque,et al.  Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies , 2010, Human mutation.

[85]  R. Norio Finnish Disease Heritage II: population prehistory and genetic roots of Finns , 2003, Human Genetics.

[86]  Tom R. Gaunt,et al.  Predicting the functional consequences of cancer-associated amino acid substitutions , 2013, Bioinform..

[87]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[88]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[89]  S. O’Rahilly,et al.  ob gene mutations and human obesity , 1998, Proceedings of the Nutrition Society.

[90]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[91]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[92]  M. Vihinen,et al.  Performance of mutation pathogenicity prediction methods on missense variants , 2011, Human mutation.

[93]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[94]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[95]  D. Bonthron,et al.  Autozygosity Mapping with Exome Sequence Data , 2013, Human mutation.

[96]  Keith A. Boroevich,et al.  Piecing together a ciliome. , 2006, Trends in genetics : TIG.

[97]  A. Gonzalez-Perez,et al.  Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. , 2011, American journal of human genetics.

[98]  Heikki Joensuu,et al.  Comparison of solution-based exome capture methods for next generation sequencing , 2011, Genome Biology.

[99]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[100]  Colin A. Johnson,et al.  Mutations in radial spoke head protein genes RSPH9 and RSPH4A cause primary ciliary dyskinesia with central-microtubular-pair abnormalities. , 2009, American journal of human genetics.

[101]  Tom R. Gaunt,et al.  From a Single Whole Exome Read to Notions of Clinical Screening: Primary Ciliary Dyskinesia and RSPH9 p.Lys268del in the Arabian Peninsula , 2012, Annals of human genetics.

[102]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[103]  Enrique Blanco,et al.  ENCODE (Encyclopedia of DNA Elements) , 2014 .

[104]  D. Cooper,et al.  Genomic rearrangements in the CFTR gene: Extensive allelic heterogeneity and diverse mutational mechanisms , 2004, Human mutation.

[105]  Emidio Capriotti,et al.  Bioinformatics Original Paper Predicting the Insurgence of Human Genetic Diseases Associated to Single Point Protein Mutations with Support Vector Machines and Evolutionary Information , 2022 .

[106]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.