A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes

For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the “isochore theory,” which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the “murid shift,” and in many ways resembles the genome of opossum. We find no support to the “isochore theory.” Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires.

[1]  C. Peng,et al.  Mosaic organization of DNA nucleotides. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[2]  The Honeybee Genome Sequencing Consortium,et al.  Erratum: Insights into social insects from the genome of the honeybee Apis mellifera , 2006, Nature.

[3]  K. Worley,et al.  The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution , 2009, Science.

[4]  Webb Miller,et al.  Using genomic data to unravel the root of the placental mammal phylogeny. , 2007, Genome research.

[5]  Wentian Li,et al.  Understanding long-range correlations in DNA sequences , 1994, chao-dyn/9403002.

[6]  Erich Bornberg-Bauer,et al.  Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality , 2013, Genome research.

[7]  Giorgio Bernardi,et al.  Human chromosomal bands: nested structure, high-definition map and molecular basis , 2007, Chromosoma.

[8]  M. Nowak,et al.  No signs of hidden language in noncoding DNA. , 1996, Physical review letters.

[9]  Brian R. Johnson,et al.  The Genome Sequence of the Leaf-Cutter Ant Atta cephalotes Reveals Insights into Its Obligate Symbiotic Lifestyle , 2011, PLoS genetics.

[10]  Tatiana V Tatarinova,et al.  GC3 biology in corn, rice, sorghum and other grasses , 2010, BMC Genomics.

[11]  D. Graur,et al.  IsoPlotter+: A Tool for Studying the Compositional Architecture of Genomes , 2013, ISRN bioinformatics.

[12]  D. Mouchiroud,et al.  Evolution of isochores in rodents. , 1997, Molecular biology and evolution.

[13]  D Larhammar,et al.  Lack of biological significance in the 'linguistic features' of noncoding DNA--a quantitative analysis. , 1996, Nucleic acids research.

[14]  G. Bernardi,et al.  Codon usage and genome composition , 2005, Journal of Molecular Evolution.

[15]  G. Bernardi,et al.  Similar integration but different stability of Alus and LINEs in the human genome. , 2001, Gene.

[16]  Sebastian Bernhardsson,et al.  Zipf's law unzipped , 2011, ArXiv.

[17]  Brian R. Johnson,et al.  Draft genome of the red harvester ant Pogonomyrmex barbatus , 2011, Proceedings of the National Academy of Sciences.

[18]  Erich Bornberg-Bauer,et al.  Functional and Evolutionary Insights from the Genomes of Three Parasitoid Nasonia Species , 2010, Science.

[19]  A. Eyre-Walker,et al.  The Compositional Evolution of the Murid Genome , 2002, Journal of Molecular Evolution.

[20]  Eran Elhaik,et al.  GC3 Biology in Eukaryotes and Prokaryotes , 2012, 1203.3929.

[21]  A. Nekrutenko,et al.  Assessment of compositional heterogeneity within and between eukaryotic genomes. , 2000, Genome research.

[22]  G. Bernardi,et al.  The isochore patterns of invertebrate genomes , 2009, BMC Genomics.

[23]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[24]  Brian R. Johnson,et al.  Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile) , 2011, Proceedings of the National Academy of Sciences.

[25]  A A Tsonis,et al.  Is DNA a language? , 1997, Journal of theoretical biology.

[26]  Giorgio Bernardi,et al.  An isochore map of human chromosomes. , 2006, Genome research.

[27]  G Bernardi,et al.  Compositional heterogeneity within and among isochores in mammalian genomes. I. CsCl and sequence analyses. , 2001, Gene.

[28]  G. Bernardi,et al.  Compositional patterns in the genomes of unicellular eukaryotes , 2013, BMC Genomics.

[29]  Dan Graur,et al.  GC composition of the human genome: in search of isochores. , 2005, Molecular biology and evolution.

[30]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[31]  G. Bernardi,et al.  GC level and expression of human coding sequences. , 2008, Biochemical and biophysical research communications.

[32]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[33]  Jan Paces,et al.  A compact view of isochores in the draft human genome sequence , 2002, FEBS letters.

[34]  Jonathan Romiguier,et al.  Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes. , 2010, Genome research.

[35]  G. Bernardi,et al.  Distribution of DNA methylation, CpGs, and CpG islands in human isochores. , 2010, Genomics.

[36]  Wentian Li,et al.  Isochores Merit the Prefix 'Iso' , 2002, Genome Biology.

[37]  Michael Hackenberg,et al.  IsoFinder: computational prediction of isochores in genome sequences , 2004, Nucleic Acids Res..

[38]  Peer Bork,et al.  The Genome of the Model Beetle and Pest Tribolium Castaneum Vertebrate-specific Orthologues Insect-specific Orthologues Homology Undetectable Similarity , 2022 .

[39]  G Bernardi,et al.  Misunderstandings about isochores. Part 1. , 2001, Gene.

[40]  A K Konopka,et al.  Noncoding DNA, Zipf's law, and language. , 1995, Science.

[41]  G Bernardi,et al.  The major components of the mouse and human genomes. 2. Reassociation kinetics. , 1981, European journal of biochemistry.

[42]  M. Lynch Evolution of the mutation rate. , 2010, Trends in genetics : TIG.

[43]  Giorgio Bernardi,et al.  The short-sequence designs of isochores from the human genome , 2008, Proceedings of the National Academy of Sciences.

[44]  N. Galtier,et al.  Isochore evolution in mammals: a human-like ancestral structure. , 1998, Genetics.

[45]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[46]  G. Bernardi,et al.  Isochores and the Regulation of Gene Expression in the Human Genome , 2011, Genome biology and evolution.

[47]  G Bernardi,et al.  An analysis of eukaryotic genomes by density gradient centrifugation. , 1976, Journal of molecular biology.

[48]  Simon Easteal,et al.  Rates of genome evolution and branching order from whole genome analysis. , 2007, Molecular biology and evolution.

[49]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[50]  Gaston H. Gonnet,et al.  A Phylogenomic Study of Human, Dog, and Mouse , 2006, PLoS Comput. Biol..

[51]  Dan Graur,et al.  Can GC content at third-codon positions be used as a proxy for isochore composition? , 2009, Molecular biology and evolution.

[52]  G. Bernardi,et al.  Correlations between coding and contiguous non-coding sequences in isochore families from vertebrate genomes. , 2008, Gene.

[53]  Giorgio Bernardi,et al.  GC3 of genes can be used as a proxy for isochore base composition: a reply to Elhaik et al. , 2011, Molecular biology and evolution.

[54]  L. Hedges Distribution Theory for Glass's Estimator of Effect size and Related Estimators , 1981 .

[55]  Kresimir Josic,et al.  Comparative testing of DNA segmentation algorithms using benchmark simulations. , 2010, Molecular biology and evolution.

[56]  K. Kjer,et al.  Site specific rates of mitochondrial genomes and the phylogeny of eutheria , 2007, BMC Evolutionary Biology.

[57]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[58]  Dan Graur,et al.  Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm , 2010, Nucleic acids research.

[59]  G. Bernardi,et al.  The compositional distribution of coding sequences and DNA molecules in humans and murids , 2005, Journal of Molecular Evolution.

[60]  G Bernardi,et al.  The major components of the mouse and human genomes. 1. Preparation, basic properties and compositional heterogeneity. , 1981, European journal of biochemistry.

[61]  Andrew R. Jackson,et al.  The Genome of the Sea Urchin Strongylocentrotus purpuratus , 2006, Science.

[62]  M. Hasegawa,et al.  Pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Chan,et al.  Can Zipf distinguish language from noise in noncoding DNA? , 1996, Physical review letters.

[64]  G Bernardi,et al.  Compositional heterogeneity within and among isochores in mammalian genomes. II. Some general comments. , 2001, Gene.

[65]  G Bernardi,et al.  An approach to the organization of eukaryotic genomes at a macromolecular level. , 1976, Journal of molecular biology.

[66]  M A Nowak,et al.  Explaining "linguistic features" of noncoding DNA. , 1996, Science.

[67]  Andy Purvis,et al.  A higher-level MRP supertree of placental mammals , 2006, BMC Evolutionary Biology.

[68]  N. Saitou,et al.  Heterogeneous Tempo and Mode of Conserved Noncoding Sequence Evolution among Four Mammalian Orders , 2013, Genome biology and evolution.

[69]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[70]  G. Bernardi,et al.  How not to search for isochores: a reply to Cohen et Al. , 2005, Molecular biology and evolution.

[71]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[72]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[73]  Evgeny M. Zdobnov,et al.  Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle , 2010, Proceedings of the National Academy of Sciences.

[74]  G. Bernardi,et al.  Isochore patterns and gene distributions in fish genomes. , 2007, Genomics.

[75]  S. Costantini,et al.  Genealogy of an ancient protein family: the Sirtuins, a family of disordered members , 2013, BMC Evolutionary Biology.

[76]  A. Eyre-Walker,et al.  Analysis of the Phylogenetic Distribution of Isochores in Vertebrates and a Test of the Thermal Stability Hypothesis , 2002, Journal of Molecular Evolution.

[77]  Ramón Román-Roldán,et al.  Isochore chromosome maps of the human genome. , 2002, Gene.