Recent advances in the detection of repeat expansions with short-read next-generation sequencing

Short tandem repeats (STRs), also known as microsatellites, are commonly defined as consisting of tandemly repeated nucleotide motifs of 2–6 base pairs in length. STRs appear throughout the human genome, and about 239,000 are documented in the Simple Repeats Track available from the UCSC (University of California, Santa Cruz) genome browser. STRs vary in size, producing highly polymorphic markers commonly used as genetic markers. A small fraction of STRs (about 30 loci) have been associated with human disease whereby one or both alleles exceed an STR-specific threshold in size, leading to disease. Detection of repeat expansions is currently performed with polymerase chain reaction–based assays or with Southern blots for large expansions. The tests are expensive and time-consuming and are not always conclusive, leading to lengthy diagnostic journeys for patients, potentially including missed diagnoses. The advent of whole exome and whole genome sequencing has identified the genetic cause of many genetic disorders; however, analysis pipelines are focused primarily on the detection of short nucleotide variations and short insertions and deletions (indels). Until recently, repeat expansions, with the exception of the smallest expansion (SCA6), were not detectable in next-generation short-read sequencing datasets and would have been ignored in most analyses. In the last two years, four analysis methods with accompanying software (ExpansionHunter, exSTRa, STRetch, and TREDPARSE) have been released. Although a comprehensive comparative analysis of the performance of these methods across all known repeat expansions is still lacking, it is clear that these methods are a valuable addition to any existing analysis pipeline. Here, we detail how to assess short-read data for evidence of expansions, reviewing all four methods and outlining their strengths and weaknesses. Implementation of these methods should lead to increased diagnostic yield of repeat expansion disorders for known STR loci and has the potential to detect novel repeat expansions.

[1]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[2]  T. Ebner,et al.  Bidirectional expression of CUG and CAG expansion transcripts and intranuclear polyglutamine inclusions in spinocerebellar ataxia type 8 , 2006, Nature Genetics.

[3]  J. Kirby,et al.  The widening spectrum of C9ORF72-related disease; genotype/phenotype correlations and potential modifiers of clinical phenotype , 2014, Acta Neuropathologica.

[4]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[5]  Mohammad Shabbir Hasan,et al.  Performance evaluation of indel calling tools using real short-read data , 2015, Human Genomics.

[6]  S. Chong,et al.  Improved high sensitivity screen for Huntington disease using a one-step triplet-primed PCR and melting curve assay , 2017, PloS one.

[7]  D. Goudie,et al.  A general method for the detection of large CAG repeat expansions by fluorescent PCR. , 1996, Journal of medical genetics.

[8]  Yaniv Erlich,et al.  Genome-wide profiling of heritable and de novo STR variations , 2016, Nature Methods.

[9]  I. Kanazawa,et al.  HTT haplotypes contribute to differences in Huntington disease prevalence between Europe and East Asia , 2011, European Journal of Human Genetics.

[10]  A. Dürr,et al.  Spinocerebellar ataxia with sensory neuropathy (SCA25) maps to chromosome 2p , 2004, Annals of neurology.

[11]  Koji Abe,et al.  Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy , 2018, Nature Genetics.

[12]  M. Atadzhanov,et al.  Evidence for a common founder effect amongst South African and Zambian individuals with Spinocerebellar ataxia type 7 , 2015, Journal of the Neurological Sciences.

[13]  Denis C. Bauer,et al.  Cpipe: a shared variant detection pipeline designed for diagnostic settings , 2015, bioRxiv.

[14]  Gene W. Yeo,et al.  Elimination of Toxic Microsatellite Repeat Expansion RNA by RNA-Targeting Cas9 , 2017, Cell.

[15]  GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing , 2018 .

[16]  S. Rosset,et al.  lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[17]  David Heckerman,et al.  Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes , 2017, American journal of human genetics.

[18]  C. Xing,et al.  Association and familial segregation of CTG18.1 trinucleotide repeat expansion of TCF4 gene in Fuchs' endothelial corneal dystrophy. , 2014, Investigative ophthalmology & visual science.

[19]  T. Petes,et al.  Nanopore sequencing of complex genomic rearrangements in yeast reveals mechanisms of repeat-mediated double-strand break repair , 2017, Genome research.

[20]  Franco Taroni,et al.  Molecular genetics of hereditary spinocerebellar ataxia: mutation analysis of spinocerebellar ataxia genes and CAG/CTG repeat expansion detection in 225 Italian families. , 2004, Archives of neurology.

[21]  R. Durbin,et al.  Dindel: accurate indel calls from short-read data. , 2011, Genome research.

[22]  P. McColgan,et al.  C9orf72 expansions are the most common genetic cause of Huntington disease phenocopies , 2014, Neurology.

[23]  Rick M Tankard,et al.  Identifying disease-causing short tandem repeat expansions in massively parallel sequencing data, with a focus on ataxias , 2017 .

[24]  P. Lockhart,et al.  Detecting tandem repeat expansions in cohorts sequenced with short-read sequencing data , 2017, bioRxiv.

[25]  L. Wilkins C9orf72 expansions are the most common genetic cause of Huntington disease phenocopies , 2014, Neurology.

[26]  Chris Shaw,et al.  Detection of long repeat expansions from PCR-free whole-genome sequence data , 2016, bioRxiv.

[27]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[28]  D. MacArthur,et al.  STRetch: detecting and discovering pathogenic short tandem repeat expansions , 2017, bioRxiv.

[29]  Funded Statistical Methods groups-AWG,et al.  Improving genetic diagnosis in Mendelian disease with transcriptome sequencing , 2017 .

[30]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.