Organization and Evolution of Primate Centromeric DNA from Whole-Genome Shotgun Sequence Data

The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

[1]  M. Goodman,et al.  The genomic record of Humankind's evolutionary roots. , 1999, American journal of human genetics.

[2]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[3]  Francesca Antonacci,et al.  Evolutionary Formation of New Centromeres in Macaque , 2007, Science.

[4]  Huntington F. Willard,et al.  Interhomologue sequence variation of alpha satellite DNA from human chromosome 17: Evidence for concerted evolution along haplotypic lineages , 1995, Journal of Molecular Evolution.

[5]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[6]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[7]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[8]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[9]  Huntington F Willard,et al.  The evolutionary dynamics of alpha-satellite. , 2005, Genome research.

[10]  Richard A. Gibbs,et al.  White Paper for Complete Sequencing of the Rhesus Macaque ( Macaca mulatta ) Genome , 2002 .

[11]  I. Alexandrov,et al.  Unequal cross‐over is involved in human alpha satellite DNA rearrangements on a border of the satellite domain , 1998, FEBS letters.

[12]  P. Musich,et al.  Highly repetitive component alpha and related alphoid DNAs in man and monkeys. , 1980, Chromosoma.

[13]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[14]  N. Archidiacono,et al.  Comparative mapping of human alphoid sequences in great apes using fluorescence in situ hybridization. , 1995, Genomics.

[15]  M. Ferguson-Smith,et al.  Human centromeric DNAs , 1997, Human Genetics.

[16]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[17]  J Gosden,et al.  Characterization of a chromosome-specific chimpanzee alpha satellite subset: evolutionary relationship to subsets on human chromosomes. , 1996, Genomics.

[18]  C. Jones,et al.  Evolutionarily different alphoid repeat DNA on homologous chromosomes in human and chimpanzee. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[19]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[20]  E. Winzeler,et al.  Genomic and Genetic Definition of a Functional Human Centromere , 2001, Science.

[21]  Vladimir Paar,et al.  ColorHOR-novel graphical algorithm for fast scan of alpha satellite higher-order repeats and HOR annotation for GenBank sequence of human genome , 2005, Bioinform..

[22]  T. A. Akopian,et al.  Chromosome-specific alpha satellites: two distinct families on human chromosome 18. , 1991, Genomics.

[23]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[24]  V. Paar,et al.  Key-string segmentation algorithm and higher-order repeat 16mer (54 copies) in human alpha satellite DNA in chromosome 7. , 2003, Journal of theoretical biology.

[25]  Valery Shepelev,et al.  Alpha-satellite DNA of primates: old and new families , 2001, Chromosoma.

[26]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[27]  M. Adams,et al.  Molecular structure and evolution of an alpha satellite/non-alpha satellite junction at 16p11. , 2000, Human molecular genetics.

[28]  P. Musich,et al.  Highly repetitive component α and related alphoid DNAs in man and monkeys , 2004, Chromosoma.

[29]  Eray Tüzün,et al.  The Role of Unequal Crossover in Alpha-Satellite DNA Evolution: A Computational Analysis , 2004, J. Comput. Biol..

[30]  Huntington F. Willard,et al.  Chromosome-specific α-satellite DNA from the centromere of chimpanzee chromosome 4 , 1997, Chromosoma.

[31]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[32]  H. Willard,et al.  Pulsed-field gel analysis of alpha-satellite DNA at the human X chromosome centromere: high-frequency polymorphisms and array size estimate. , 1990, Genomics.

[33]  Huntington F Willard,et al.  Progressive proximal expansion of the primate X chromosome centromere. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[35]  H. Willard,et al.  Chromosome-specific alpha-satellite DNA from the centromere of chimpanzee chromosome 4. , 1997, Chromosoma.

[36]  Huntington F. Willard,et al.  Chromosome-specific subsets of human alpha satellite DNA: Analysis of sequence divergence within and between chromosomal subsets and evidence for an ancestral pentameric repeat , 2005, Journal of Molecular Evolution.

[37]  Seung-Beom Hong,et al.  Sequence and evolution of rhesus monkey alphoid DNA , 2005, Journal of Molecular Evolution.

[38]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[39]  Jonathan Bingham,et al.  Visualizing large hierarchical clusters in hyperbolic space , 2000, Bioinform..

[40]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .