Triplet repeats in human genome: distribution and their association with genes and other genomic regions

MOTIVATION Simple sequence repeats (SSRs) or microsatellite repeats are found abundantly in many prokaryotic and eukaryotic genomes. Among SSRs, triplet repeats are of special significance because some of them have been linked to various genetic disorders. The objective of the study is to analyze the triplet repeats of complete human genome and to identify the genes that contain the triplet repeats in their coding region. The analysis will help us to identify the candidate genes that have potential for repeat expansion. RESULTS We have analyzed triplet repeats in the complete human genome from the publicly available sequences. Our analysis revealed that AGC and CCG repeat were predominantly present in the coding regions of the genome while UTRs and the upstream sequences contained CCG repeats in relative abundance. Analysis of density of triplet repeats (bp/Mb) revealed that AAT and AAC were the abundant repeats whereas ACT and ACG were the rare repeats found in human genome. We could identify about 2135 known or predicted genes that were associated with at least one of the triplet repeat types. A large proportion of putative transcripts that were identified by gene finding programs were found to be associated with triplet repeats. These transcripts will be the candidate genes for analysis of triplet repeat expansion and a possible association with disease phenotypes. Identification of 171 genes which contain a minimum of ten repeat units will be of particular interest in future in correlating their association with any disease phenotype due to the expansion potential of repeats present in them. The list of genes and other details of analysis are given in the online supplementary data (http://www.ingenovis.com/tripletrepeats).

[1]  G. Richard,et al.  Mini‐ and microsatellite expansions: the recombination connection , 2000, EMBO reports.

[2]  Christopher A. Ross,et al.  CCG repeats in cDNAs from human brain , 1998, Human Genetics.

[3]  R. Sinden,et al.  Trinucleotide repeat DNA structures: dynamic mutations from dynamic DNA. , 1998, Current opinion in structural biology.

[4]  David P. Kreil,et al.  Asparagine repeats are rare in mammalian proteins. , 2000, Trends in biochemical sciences.

[5]  P. Jin,et al.  Understanding the molecular basis of fragile X syndrome. , 2000, Human molecular genetics.

[6]  L. Timchenko,et al.  Molecular Basis for Impaired Muscle Differentiation in Myotonic Dystrophy , 2001, Molecular and Cellular Biology.

[7]  Willy Lissens,et al.  PGD in the lab for triplet repeat diseases — myotonic dystrophy, Huntington's disease and Fragile-X syndrome , 2001, Molecular and Cellular Endocrinology.

[8]  S. Tapscott,et al.  CTCF-binding sites flank CTG/CAG repeats and form a methylation-sensitive insulator at the DM1 locus , 2001, Nature Genetics.

[9]  Richard R. Sinden,et al.  Neurodegenerative diseases: Origins of instability , 2001, Nature.

[10]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[11]  Y. Kashi,et al.  Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. , 2000, Genome research.

[12]  N H Terry,et al.  "Mitotic drive" of expanded CTG repeats in myotonic dystrophy type 1 (DM1). , 2001, Human molecular genetics.

[13]  V. Funanage,et al.  Effect of triplet repeat expansion on chromatin structure and expression of DMPK and neighboring genes, SIX5 and DMWD, in myotonic dystrophy. , 2001, Molecular genetics and metabolism.

[14]  H. Zoghbi,et al.  Fourteen and counting: unraveling trinucleotide repeat diseases. , 2000, Human molecular genetics.

[15]  R I Richards,et al.  FRA10B structure reveals common elements in repeat expansion and chromosomal fragile site genesis. , 1998, Molecular cell.

[16]  J. Jurka,et al.  Microsatellites in different eukaryotic genomes: survey and analysis. , 2000, Genome research.