Evaluation of microsatellite variation in the 1000 Genomes Project pilot studies is indicative of the quality and utility of the raw data and alignments.

We performed an analysis of global microsatellite variation on the two kindreds sequenced at high depth (~20×-60×) in the 1000 Genomes Project pilot studies because alterations in these highly mutable repetitive sequences have been linked with many phenotypes and disease risks. The standard alignment technique performs poorly in microsatellite regions as a consequence of low effective coverage (~1×-5×) resulting in 79% of the informative loci exhibiting non-Mendelian inheritance patterns. We used a more stringent approach in computing robust allelotypes resulting in 94.4% of the 1095 informative repeats conforming to traditional inheritance. The high-confidence allelotypes were analyzed to obtain an estimate of the minimum polymorphism rate as a function of motif length, motif sequence, and distribution within the genome.

[1]  Edward L. Lee,et al.  Distinct High-Profile Methylated Genes in Colorectal Cancer , 2009, PloS one.

[2]  Kenny Q. Ye,et al.  Strong Association of De Novo Copy Number Mutations with Autism , 2007, Science.

[3]  M. Maheshwari,et al.  PRENATAL DIAGNOSIS OF DUCHENNE MUSCULAR DYSTROPHY , 1977, The Lancet.

[4]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.

[5]  H. Garner,et al.  Global microsatellite content distinguishes humans, primates, animals, and plants. , 2009, Molecular biology and evolution.

[6]  Rob Willemsen,et al.  Microsatellite repeat instability and neurological disease , 2009, BioEssays : news and reviews in molecular, cellular and developmental biology.

[7]  Chris Sander,et al.  The tyrosine phosphatase PTPRD is a tumor suppressor that is frequently inactivated and mutated in glioblastoma and other human cancers , 2009, Proceedings of the National Academy of Sciences.

[8]  Alan J. Cann Genomes , 2012, Principles of Molecular Virology.

[9]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[10]  A. Read,et al.  The presence of multiple regions of homozygous deletion at the CSMD1 locus in oral squamous cell carcinoma question the role of CSMD1 in head and neck carcinogenesis , 2003, Genes, chromosomes & cancer.

[11]  J. Butler,et al.  Genetics and Genomics of Core Short Tandem Repeat Loci Used in Human Identity Testing , 2006, Journal of forensic sciences.

[12]  J. Pankow,et al.  Suggestion for linkage of chromosome 1p35.2 and 3q28 to plasma adiponectin concentrations in the GOLDN Study , 2009, BMC Medical Genetics.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  G. Bishop,et al.  Allele non-amplification: a source of confusion in linkage studies employing microsatellite polymorphisms. , 1993, Human molecular genetics.

[15]  Gregory D Schuler,et al.  Sequence mapping by electronic PCR , 1997, Genome research.

[16]  J. Sidorova,et al.  A distinct first replication cycle of DNA introduced in mammalian cells , 2010, Nucleic acids research.

[17]  R. Eeles,et al.  Polyglutamine repeat length in the AIB1 gene modifies breast cancer susceptibility in BRCA1 carriers , 2004, International journal of cancer.

[18]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[19]  R. Heilig,et al.  Nonradioactive assay for new microsatellite polymorphisms at the 5' end of the dystrophin gene, and estimation of intragenic recombination. , 1991, American journal of human genetics.

[20]  David I. Smith,et al.  Common fragile sites, extremely large genes, neural development and cancer. , 2006, Cancer letters.

[21]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[22]  T. Glover,et al.  Common fragile sites , 2003, Cytogenetic and Genome Research.

[23]  D. King,et al.  Simple sequence repeats: genetic modulators of brain function and behavior , 2008, Trends in Neurosciences.

[24]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[25]  H. Garner,et al.  Detection of length-dependent effects of tandem repeat alleles by 3-D geometric decomposition of craniofacial variation , 2006, Development Genes and Evolution.

[26]  Tara L. Naylor,et al.  Characterization CSMD1 in a large set of primary lung, head and neck, breast and skin cancer tissues , 2009, Cancer biology & therapy.

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  H. Zoghbi,et al.  Fourteen and counting: unraveling trinucleotide repeat diseases. , 2000, Human molecular genetics.

[29]  D. Levinson,et al.  Identification and analysis of error types in high-throughput genotyping. , 2000, American journal of human genetics.

[30]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[31]  Christa Lese Martin,et al.  Cytogenetic and molecular characterization of A2BP1/FOX1 as a candidate gene for autism , 2007, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[32]  J. Minna,et al.  Searching for microsatellite mutations in coding regions in lung, breast, ovarian and colorectal cancers , 2001, Oncogene.

[33]  A. Simpson,et al.  The natural somatic mutation frequency and human carcinogenesis. , 1997, Advances in cancer research.

[34]  H. Ellegren Microsatellites: simple sequences with complex evolution , 2004, Nature Reviews Genetics.