Genome-wide profiling of heritable and de novo STR variations

Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, it has proven problematic to genotype STRs from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping and phasing STRs from Illumina sequencing data, and we report a genome-wide analysis and validation of de novo STR mutations. HipSTR is freely available at https://hipstr-tool.github.io/HipSTR.

[1]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[2]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[3]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[4]  R. Durbin,et al.  Dindel: accurate indel calls from short-read data. , 2011, Genome research.

[5]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[6]  C. Tyler-Smith,et al.  Population-Scale Sequencing Data Enable Precise Estimates of Y-STR Mutation Rates. , 2016, American journal of human genetics.

[7]  Yaniv Erlich,et al.  The landscape of human STR variation , 2014, bioRxiv.

[8]  Yaniv Erlich,et al.  Abundant contribution of short tandem repeats to gene expression variation in humans , 2015, Nature Genetics.

[9]  C. Nusbaum,et al.  Comprehensive variation discovery in single human genomes , 2014, Nature Genetics.

[10]  M. Feldman,et al.  Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure , 2005, PLoS genetics.

[11]  Terrence S. Furey,et al.  The UCSC Genome Browser Database: update 2006 , 2005, Nucleic Acids Res..

[12]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[13]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[14]  Arnaud Estoup,et al.  Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis , 2002, Molecular ecology.

[15]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[16]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[17]  S. Rosset,et al.  lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[18]  Judith Roth,et al.  A polymorphic microsatellite that mediates induction of PIG3 by p53 , 2002, Nature Genetics.

[19]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[20]  S. Mirkin Expandable DNA repeats and human disease , 2007, Nature.

[21]  Morris Swertz,et al.  Genome-wide patterns and properties of de novo mutations in humans , 2015, Nature Genetics.

[22]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[23]  Yun S. Song,et al.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations , 2016, Nature.

[24]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[25]  G. Cutting,et al.  A variable dinucleotide repeat in the CFTR gene contributes to phenotype diversity by forming RNA secondary structures that alter splicing. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Heng Li,et al.  FermiKit: assembly-based variant calling for Illumina resequencing data , 2015, Bioinform..

[27]  G. McVean,et al.  A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree , 2016, bioRxiv.

[28]  Paul Medvedev,et al.  Accurate typing of short tandem repeats from genome-wide sequencing data and its applications , 2015, Genome research.

[29]  G. Highnam,et al.  Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles , 2012, Nucleic acids research.