Detecting Expansions of Tandem Repeats in Cohorts Sequenced with Short-Read Sequencing Data.

Repeat expansions cause more than 30 inherited disorders, predominantly neurogenetic. These can present with overlapping clinical phenotypes, making molecular diagnosis challenging. Single-gene or small-panel PCR-based methods can help to identify the precise genetic cause, but they can be slow and costly and often yield no result. Researchers are increasingly performing genomic analysis via whole-exome and whole-genome sequencing (WES and WGS) to diagnose genetic disorders. However, until recently, analysis protocols could not identify repeat expansions in these datasets. We developed exSTRa (expanded short tandem repeat algorithm), a method that uses either WES or WGS to identify repeat expansions. Performance of exSTRa was assessed in a simulation study. In addition, four retrospective cohorts of individuals with eleven different known repeat-expansion disorders were analyzed with exSTRa. We assessed results by comparing the findings to known disease status. Performance was also compared to three other analysis methods (ExpansionHunter, STRetch, and TREDPARSE), which were developed specifically for WGS data. Expansions in the assessed STR loci were successfully identified in WES and WGS datasets by all four methods with high specificity and sensitivity. Overall, exSTRa demonstrated more robust and superior performance for WES data than did the other three methods. We demonstrate that exSTRa can be effectively utilized as a screening tool for detecting repeat expansions in WES and WGS data, although the best performance would be produced by consensus calling, wherein at least two out of the four currently available screening methods call an expansion.

[1]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[2]  Euan A Ashley,et al.  The precision medicine initiative: a new national effort. , 2015, JAMA.

[3]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[4]  David Heckerman,et al.  Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes , 2017, American journal of human genetics.

[5]  O. Rivero-Arias,et al.  Epidemiology of fragile X syndrome: A systematic review and meta‐analysis , 2014, American journal of medical genetics. Part A.

[6]  Patrizia Rizzu,et al.  A Pentanucleotide ATTTC Repeat Insertion in the Non-coding Region of DAB1, Mapping to SCA37, Causes Spinocerebellar Ataxia. , 2017, American journal of human genetics.

[7]  R. Hagerman,et al.  Review of targeted treatments in fragile X syndrome. , 2016, Intractable & rare diseases research.

[8]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[9]  Chris Shaw,et al.  Detection of long repeat expansions from PCR-free whole-genome sequence data , 2016, bioRxiv.

[10]  Koji Abe,et al.  Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy , 2018, Nature Genetics.

[11]  A. Hannan,et al.  Tandem repeats mediating genetic plasticity in health and disease , 2018, Nature Reviews Genetics.

[12]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[13]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[14]  Jiannis Ragoussis,et al.  Next generation sequencing for molecular diagnosis of neurological disorders using ataxias as a model , 2013, Brain : a journal of neurology.

[15]  G. Highnam,et al.  Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles , 2012, Nucleic acids research.

[16]  J. Schulz,et al.  Diagnosis and treatment of Friedreich ataxia: a European perspective , 2009, Nature Reviews Neurology.

[17]  R. Hagerman,et al.  Treatment of the psychiatric problems associated with fragile X syndrome , 2015, Current opinion in psychiatry.

[18]  H. Rehm Evolving health care through personal genomics , 2017, Nature Reviews Genetics.

[19]  Yaniv Erlich,et al.  Genome-wide profiling of heritable and de novo STR variations , 2016, Nature Methods.

[20]  F. Tassone Newborn screening for fragile X syndrome. , 2008, JAMA neurology.

[21]  Melanie Bahlo,et al.  Recent advances in the detection of repeat expansions with short-read next-generation sequencing , 2018, F1000Research.

[22]  Dong-Yun Kim,et al.  Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs , 2014, Bioinform..

[23]  P. Hagerman,et al.  Advances in clinical and molecular understanding of the FMR1 premutation and fragile X-associated tremor/ataxia syndrome , 2013, The Lancet Neurology.

[24]  S. Tabrizi,et al.  DNA repair in the trinucleotide repeat disorders , 2017, The Lancet Neurology.

[25]  Brent S. Pedersen,et al.  Who’s Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy , 2016, bioRxiv.

[26]  J. Greenberg,et al.  Prevalence of CGG expansions of the FMR1 gene in a US population‐based sample , 2012, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[27]  Belinda Phipson,et al.  STRetch: detecting and discovering pathogenic short tandem repeat expansions , 2018, Genome Biology.

[28]  Nima Mousavi,et al.  Profiling the genome-wide landscape of tandem repeat expansions , 2018 .

[29]  S. Rosset,et al.  lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[30]  P. Holmans,et al.  DNA repair pathways underlie a common genetic mechanism modulating onset in polyglutamine diseases , 2016, Annals of neurology.

[31]  Minh Duc Cao,et al.  Inferring short tandem repeat variation from paired-end short reads , 2013, Nucleic acids research.

[32]  Jane S. Paulsen,et al.  Identification of Genetic Factors that Modify Clinical Onset of Huntington’s Disease , 2015, Cell.

[33]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..