Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Next-generation sequencing (NGS) is quickly revolutionizing how research into the genetic determinants of constitutional disease is performed. The technique is highly efficient with millions of sequencing reads being produced in a short time span and at relatively low cost. Specifically, targeted NGS is able to focus investigations to genomic regions of particular interest based on the disease of study. Not only does this further reduce costs and increase the speed of the process, but it lessens the computational burden that often accompanies NGS. Although targeted NGS is restricted to certain regions of the genome, preventing identification of potential novel loci of interest, it can be an excellent technique when faced with a phenotypically and genetically heterogeneous disease, for which there are previously known genetic associations. Because of the complex nature of the sequencing technique, it is important to closely adhere to protocols and methodologies in order to achieve sequencing reads of high coverage and quality. Further, once sequencing reads are obtained, a sophisticated bioinformatics workflow is utilized to accurately map reads to a reference genome, to call variants, and to ensure the variants pass quality metrics. Variants must also be annotated and curated based on their clinical significance, which can be standardized by applying the American College of Medical Genetics and Genomics Pathogenicity Guidelines. The methods presented herein will display the steps involved in generating and analyzing NGS data from a targeted sequencing panel, using the ONDRISeq neurodegenerative disease panel as a model, to identify variants that may be of clinical significance.

[1]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M. Canesi,et al.  The G6055A (G2019S) mutation in LRRK2 is frequent in both early and late onset Parkinson’s disease and originates from a common ancestor , 2005, Journal of Medical Genetics.

[3]  L. Racacho,et al.  Translated mutation in the Nurr1 gene as a cause for Parkinson's disease , 2006, Movement disorders : official journal of the Movement Disorder Society.

[4]  Zhanjiang Liu DNA Sequencing Technologies , 2007 .

[5]  D. Blacker,et al.  Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database , 2007, Nature Genetics.

[6]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[7]  Xinglong Wang,et al.  The Roc domain of leucine‐rich repeat kinase 2 is sufficient for interaction with microtubules , 2008, Journal of neuroscience research.

[8]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[9]  I. Tikhonova,et al.  Genetic diagnosis by whole exome capture and massively parallel DNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[10]  C. Béroud,et al.  Human Splicing Finder: an online bioinformatics tool to predict splicing signals , 2009, Nucleic acids research.

[11]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[12]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[13]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[14]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[15]  Maria Teresa Dell'Anno,et al.  Direct generation of functional dopaminergic neurons from mouse and human fibroblasts , 2011, Nature.

[16]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[17]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[18]  Sara El-Metwally,et al.  Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges , 2013, PLoS Comput. Biol..

[19]  F. Collins,et al.  First FDA authorization for next-generation sequencer. , 2013, The New England journal of medicine.

[20]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[21]  R. Hegele,et al.  Exome Sequencing: New Insights into Lipoprotein Disorders , 2014, Current Cardiology Reports.

[22]  Minghong Ward,et al.  The Database of Short Genetic Variation (dbSNP) , 2014 .

[23]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[24]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[25]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[26]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[27]  Joonhong Park,et al.  Characterization of sequence-specific errors in various next-generation sequencing systems. , 2016, Molecular bioSystems.

[28]  John F. Robinson,et al.  The ONDRISeq panel: custom-designed next-generation sequencing of genes related to neurodegeneration , 2016, npj Genomic Medicine.

[29]  Lorne Zinman,et al.  The Ontario Neurodegenerative Disease Research Initiative (ONDRI) , 2016, Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques.

[30]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[31]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[32]  Perry G. Ridge,et al.  Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches , 2016, BMC Bioinformatics.

[33]  E. Mardis DNA sequencing technologies: 2006–2016 , 2017, Nature Protocols.

[34]  Sepp Hochreiter,et al.  panelcn.MOPS: Copy‐number detection in targeted NGS panel data for clinical diagnostics , 2017, Human mutation.

[35]  B. Thyagarajan,et al.  Review of Clinical Next-Generation Sequencing. , 2017, Archives of pathology & laboratory medicine.

[36]  High-frequency, low-coverage “false positives” mutations may be true in GS Junior sequencing studies , 2017, Scientific Reports.

[37]  Quan Li,et al.  InterVar: Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines. , 2017, American journal of human genetics.