Masquerading repeats: paralogous pitfalls of the human genome.

In its most simple terms, the human genome consists of two distinct fractions of DNA: repetitive and unique sequence. Traditionally, a portion of the unique fraction is thought to comprise the obvious functional constituents of our genome, including exons, introns, and regulatory DNA elements. With the exception of telomeric and centromeric repeat sequences, the functional significance of the vast majority of the repetitive fraction is less clear. Since the early experiments of reassociation kinetics of single-stranded human DNA (Britten and Kohne 1968), various gradations of repetitiveness have always been recognized on the basis of both the copy number and the degree of sequence similarity. The number of repeats range from the prolific (LINES, SINES, asatellite, etc., in the 100,000’s) to the relatively few. By virtue of the fact that multigene families exist, genes themselves may be repetitive in nature. Many of the most well-studied members of gene families (hemoglobins and HOX genes), however, appear to be sufficiently divergent (Ohno 1970) or localize to discrete clusters of tandem arrays (rRNA genes, HLA genes, immunoglobulin gene segments). These are often distinguished based on the sequence divergence of individual members or their clustered position within the human genome. The term ‘‘unique’’ DNA, therefore, is relative, determined largely by what we already know about any given genome. The more our genome becomes sequenced, the more the total amount of ‘‘apparent’’ unique sequence will dwindle, with a concomitant burgeoning of the repeat classes. The basic paradigm regarding the repetitive and unique nature of DNA sequence underlies any effort to sequence a genome. In fact, the reason that any genome can be sequenced and assembled is that there is sufficiently enough unique sequence interdigitated among the repetitive fraction, the repetitive fraction is sufficiently divergent, and/or the repetitive fraction can be distinguished as such. A simple corollary exists among the sequencing community: The fewer and less complicated the repeats, the easier a genome is to sequence. At a recent National Institutes of Health (NIH) meeting entitled, ‘‘Genomic Alterations in Genetic Disease: Mechanisms of Structural Rearrangement,’’ a much more complex picture of the organization of repeat sequences in the human genome emerged. Regions of the genome, conspicuously located within the subtelomeric and pericentromeric portions of chromosomes, which harbor large tracts (50–200 kb) of duplicated genomic segments that exhibit a remarkable degree of sequence similarity (95%–9%) are being identified. Unlike ‘‘traditional’’ repeat elements, these segments appear to carry complete or partial genomic structure of known genes, suggesting that they have recently been transposed from elsewhere in the genome. Therefore, they have the appearance of normal gene-encoding unique DNA, and are not, at first glance, easily distinguished as repetitive sequences. Interestingly, many of these large genomic segments of paralogous (sequence similarity due to duplication) sequence were discovered on either side of the breakpoint clusters of well-known microdeletion/microduplication syndromes, such as Prader–Willi syndrome (PWS) in 15q11–13, Williams syndrome, Smith–Magenis syndrome (SMS) in 17p11.2, and Velocardiofacial (VCFS) syndrome in 22q11.2, which suggests that they may have a role in mediating aberrant recombination associated with instability in these regions. Our own recent estimate from available genomic sequence in GenBank (130.1 Mb) seems to give further credibility to this complexity in our genome. A total of 1.1 Mb of genomic sequence, encompassing 21 different genes, was identified that showed remarkable sequence identity (95%–98%) to other large genomic segments or other sequenced cDNAs mapping to different locations in the genome. Most of these segments were identified among sequences mapping to the pericentromeric regions of chromosomes (2p11, 10p11, 15q11, 16p11, and 22q11), which suggests a hitherto unrecognized property of our genome to duplicate and transpose genomic segments to these regions. At the end of the NIH meeting, two general conclusions were reached regarding these complex repeat regions: (1) These repeat sequences are particularly difficult to resolve both from the perspective of mapping and sequencing; and (2) the sequence and organization of these repeat regions will be critical in understanding the process of genomic instability and disease in these regions.

[1]  M. Adams,et al.  Shotgun Sequencing of the Human Genome , 1998, Science.

[2]  B. Trask,et al.  Distribution of olfactory receptor genes in the human genome , 1998, Nature Genetics.

[3]  B. Trask,et al.  Members of the olfactory receptor gene family are contained in large blocks of DNA duplicated polymorphically near the ends of human chromosomes. , 1998, Human molecular genetics.

[4]  J. Barber,et al.  Inherited interstitial duplications of proximal 15q: genotype-phenotype correlations. , 1997, American journal of human genetics.

[5]  J. Rubin,et al.  Fluorescence in situ hybridization analysis of keratinocyte growth factor gene amplification and dispersion in evolution of great apes and humans. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A. C. Chinault,et al.  Homologous recombination of a flanking repeat gene cluster is a mechanism for a common contiguous gene deletion syndrome , 1997, Nature Genetics.

[7]  M. Rocchi,et al.  A third neurofibromatosis type 1 (NF1) pseudogene at chromosome 15q11.2 , 1997, Human Genetics.

[8]  D. Ledbetter,et al.  Inter- and intrachromosomal rearrangements are both involved in the origin of 15q11-q13 deletions in Prader-Willi syndrome. , 1997, American journal of human genetics.

[9]  R. Fulton,et al.  A physical map of human chromosome 7: an integrated YAC contig map with average STS spacing of 79 kb. , 1997, Genome research.

[10]  E. Eichler,et al.  Interchromosomal duplications of the adrenoleukodystrophy locus: a phenomenon of pericentromeric plasticity. , 1997, Human molecular genetics.

[11]  G. Miao,et al.  The dawn of the post-genome era, seen from the ocean front. , 1997, Trends in Biotechnology.

[12]  M. Lefranc,et al.  Immunoglobulin lambda light chain orphons on human chromosome 8q11.2 , 1997, European journal of immunology.

[13]  B. Dutrillaux,et al.  Emergence and scattering of multiple neurofibromatosis (NF1)-related sequences during hominoid evolution suggest a process of pericentromeric interchromosomal transposition. , 1997, Human molecular genetics.

[14]  D. Ledbetter,et al.  Refined molecular characterization of the breakpoints in small inv dup(15) chromosomes , 1996, Human Genetics.

[15]  E. Eichler,et al.  Duplication of a gene-rich cluster between 16p11.1 and Xq28: a novel pericentromeric-directed mechanism for paralogous genome evolution. , 1996, Human molecular genetics.

[16]  J. Craig Venter,et al.  A new strategy for genome sequencing , 1996, Nature.

[17]  C. G. See,et al.  A 9.75-Mb map across the centromere of human chromosome 10. , 1996, Genomics.

[18]  J. Lupski,et al.  A recombination hotspot responsible for two inherited peripheral neuropathies is located near a mariner transposon-like element , 1996, Nature Genetics.

[19]  T. Hudson,et al.  Long-range mapping and construction of a YAC contig within the cat eye syndrome critical region. , 1994, Genome research.

[20]  I. Dunham,et al.  Molecular definition of the 22q11 deletions in velo-cardio-facial syndrome. , 1995, American journal of human genetics.

[21]  J. Wienberg,et al.  Comparative mapping of DNA probes derived from the V kappa immunoglobulin gene regions on human and great ape chromosomes by fluorescence in situ hybridization. , 1995, Genomics.

[22]  H. Zachau,et al.  The immunoglobulin κ locus of primates , 1995 .

[23]  B. Emanuel,et al.  Molecular characterization of the marker chromosome associated with cat eye syndrome. , 1994, American journal of human genetics.

[24]  N. Carter,et al.  Human immunoglobulin VH and D segments on chromosomes 15q11.2 and 16p11.2. , 1994, Human molecular genetics.

[25]  H. Zachau The immunoglobulin kappa locus-or-what has been learned from looking closely at one-tenth of a percent of the human genome. , 1993, Gene.

[26]  X. Chen,et al.  Assignment of the human aggrecan gene (AGC1) to 15q26 using fluorescence in situ hybridization analysis. , 1993, Genomics.

[27]  M. Pinotti,et al.  In-frame deletion of von Willebrand factor A domains in a dominant type of von Willebrand disease. , 1993, Human molecular genetics.

[28]  J. Thacker,et al.  Formation of large deletions by illegitimate recombination in the HPRT gene of primary human fibroblasts. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[29]  A. Baldini,et al.  Low-copy-number repeat sequences flank the DiGeorge/velo-cardio-facial syndrome loci at 22q11. , 1993, Human molecular genetics.

[30]  A. Winterpacht,et al.  A putative gene family in 15q11-13 and 16p11.2: possible implications for Prader-Willi and Angelman syndromes. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[31]  M. Olson,et al.  Chromosomal region of the cystic fibrosis gene in yeast artificial chromosomes: a model for human genome mapping. , 1990, Science.

[32]  A. Jeffreys,et al.  A novel human DNA polymorphism resulting from transfer of DNA from chromosome 6 to chromosome 16. , 1990, Genomics.

[33]  H. Hameister,et al.  Transposition of human immunoglobulin V kappa genes within the same chromosome and the mechanism of their amplification. , 1990, The EMBO journal.

[34]  H. Zachau,et al.  Structural features of transposed human VK genes and implications for the mechanism of their transpositions. , 1990, Nucleic acids research.

[35]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[36]  R. Britten,et al.  Repeated Sequences in DNA , 1968 .