Profiling short tandem repeats from short reads.

Short tandem repeats (STRs), also known as microsatellites, have a wide range of applications, including medical genetics, forensics, and population genetics. High-throughput sequencing has the potential to profile large numbers of STRs, but cumbersome gapped alignment and STR-specific noise patterns hamper this task. We recently developed an algorithm, called lobSTR, to overcome these challenges and to accurately profile STRs from short reads. Here we describe how to use lobSTR to call STR variations from high-throughput sequencing datasets and to diagnose the quality of the calls.

[1]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[2]  Eric Rivals,et al.  Detecting microsatellites within genomes: significant variation among algorithms , 2007, BMC Bioinformatics.

[3]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[4]  Giovanni Destro-Bisol,et al.  The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. , 2004, American journal of human genetics.

[5]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[6]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[7]  J C Murray,et al.  Pediatrics and , 1998 .

[8]  Gajendra P. S. Raghava,et al.  Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation , 2004, Bioinform..

[9]  Chee Keong Kwoh,et al.  Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance , 2013, Briefings Bioinform..

[10]  Bernard P. Puc,et al.  An integrated semiconductor device enabling non-optical genome sequencing , 2011, Nature.

[11]  C. E. Pearson,et al.  Repeat instability: mechanisms of dynamic mutations , 2005, Nature Reviews Genetics.

[12]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[13]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[14]  Eric Buel,et al.  Forensic DNA typing by capillary electrophoresis using the ABI Prism 310 and 3100 genetic analyzers for STR analysis , 2004, Electrophoresis.

[15]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[16]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[17]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[18]  K. Sobczak,et al.  Trinucleotide repeats: triggers for genomic disorders? , 2010, Genome Medicine.

[19]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[20]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[21]  S. Rosset,et al.  lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[22]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[23]  S. Mirkin Expandable DNA repeats and human disease , 2007, Nature.

[24]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[25]  Guang R. Gao,et al.  TROLL-Tandem Repeat Occurrence Locator , 2002, Bioinform..

[26]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..