论文信息 - Genotyping microsatellites in next-generation sequencing data

Genotyping microsatellites in next-generation sequencing data

Background Microsatellites are short (2-6bp) DNA sequences repeated in tandem, which make up approximately 3% of the human genome [1]. These loci are prone to frequent mutations and high polymorphism with the estimated mutation rates of 10 10 events per locus per generation, orders of magnitude higher than other parts of the genome [2]. Dozens of neurological and developmental disorders have been attributed to microsatellite expansions [3]. Microsatellites have also been implicated in a range of functions such as DNA replication and repair, chromatin organisation and regulation of gene expression [4]. Traditionally, microsatellite variation has been measured using capillary gel electrophoresis [5]. In addition to being time-consuming, and expensive, this method fails to reveal the full complexity at these loci because it does not directly sequence the fragment but only measure the number of bases in the repeat. Next-generation sequencing has the potential to address these problems. However, determining microsatellite lengths using next-generation sequencing data is difficult. In particular, polymerase slippage during PCR amplification introduces stutter noise. A small number of software tools have been written to genotype simple microsatellites in next-generation sequencing data [6-8], however they fail to address the issues of SNPs and compound repeats, and in some cases provide only approximate genotypes. We have begun to develop a microsatellite genotyping algorithm that addresses these issues, providing high accuracy as well as more detailed analysis of microsatellite loci. We have validated it using high depth amplicon sequencing data of microsatellites near the AVPR1A gene.

[1] Huda Y. Zoghbi,et al. Diseases of Unstable Repeat Expansion: Mechanisms and Common Principles , 2005, Nature Reviews Genetics.

[2] R. Petit,et al. Current trends in microsatellite genotyping , 2011, Molecular ecology resources.

[3] International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome , 2001, Nature.

[4] Minh Duc Cao,et al. Inferring short tandem repeat variation from paired-end short reads , 2013, Nucleic acids research.

[5] Matthieu Legendre,et al. Variable tandem repeats accelerate evolution of coding and regulatory sequences. , 2010, Annual review of genetics.

[6] S. Rosset,et al. lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[7] E. Nevo,et al. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review , 2002, Molecular ecology.

[8] G. Highnam,et al. Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles , 2012, Nucleic acids research.