Segmenting the Human Genome into Isochores

The human genome is a mosaic of isochores, which are long (>200 kb) DNA sequences that are fairly homogeneous in base composition and can be assigned to five families comprising 33%–59% of GC composition. Although the compartmentalized organization of the mammalian genome has been investigated for more than 40 years, no satisfactory automatic procedure for segmenting the genome into isochores is available so far. We present a critical discussion of the currently available methods and a new approach called isoSegmenter which allows segmenting the genome into isochores in a fast and completely automatic manner. This approach relies on two types of experimentally defined parameters, the compositional boundaries of isochore families and an optimal window size of 100 kb. The approach represents an improvement over the existing methods, is ideally suited for investigating long-range features of sequenced and assembled genomes, and is publicly available at https://github.com/bunop/isoSegmenter.

[1]  G Bernardi,et al.  The major components of the mouse and human genomes. 2. Reassociation kinetics. , 1981, European journal of biochemistry.

[2]  Giorgio Bernardi,et al.  Structural and evolutionary genomics : natural selection in genome evolution , 2004 .

[3]  G. Bernardi,et al.  The evolution of isochore patterns in vertebrate genomes , 2009, BMC Genomics.

[4]  G. Bernardi,et al.  Isochore patterns and gene distributions in fish genomes. , 2007, Genomics.

[5]  G. Bernardi,et al.  Replication timing, chromosomal bands, and isochores , 2008, Proceedings of the National Academy of Sciences.

[6]  W Li,et al.  Delineating relative homogeneous G+C domains in DNA sequences. , 2001, Gene.

[7]  Jan Paces,et al.  A compact view of isochores in the draft human genome sequence , 2002, FEBS letters.

[8]  P Bernaola-Galván,et al.  Isochore chromosome maps of eukaryotic genomes. , 2001, Gene.

[9]  Michael Hackenberg,et al.  IsoFinder: computational prediction of isochores in genome sequences , 2004, Nucleic Acids Res..

[10]  G. Bernardi,et al.  Genes, isochores and bands in human chromosomes 21 and 22 , 2004, Chromosome Research.

[11]  Dan Graur,et al.  Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm , 2010, Nucleic acids research.

[12]  G. Bernardi,et al.  Isolation and characterization of mouse and guinea pig satellite deoxyribonucleic acids. , 1968, Biochemistry.

[13]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[14]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[15]  Manolis Kellis,et al.  Discovery and characterization of chromatin states for systematic annotation of the human genome , 2010, Nature Biotechnology.

[16]  R Zhang,et al.  A Novel Method to Calculate the G+C Content of Genomic DNA Sequences , 2001, Journal of biomolecular structure & dynamics.

[17]  Feng Gao,et al.  Segmentation algorithm for DNA sequences. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  G Bernardi,et al.  An analysis of the bovine genome by Cs2SO4-Ag density gradient centrifugation. , 1973, Journal of molecular biology.

[19]  Dan Graur,et al.  A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes , 2014, PLoS Comput. Biol..

[20]  G. Bernardi,et al.  Representing GC variation along eukaryotic chromosomes. , 2004, Gene.

[21]  Heikki Mannila,et al.  Discovering isochores by least-squares optimal segmentation. , 2007, Gene.

[22]  G Bernardi,et al.  The gene distribution of the human genome. , 1996, Gene.

[23]  Kresimir Josic,et al.  Comparative testing of DNA segmentation algorithms using benchmark simulations. , 2010, Molecular biology and evolution.

[24]  G. Bernardi,et al.  An analysis of the bovine genome by density gradient centrifugation: fractionation in Cs2SO4/3,6-bis(acetatomercurimethyl)dioxane density gradient. , 1977, European journal of biochemistry.

[25]  D. Frishman,et al.  Assignment of isochores for all completely sequenced vertebrate genomes using a consensus , 2008, Genome Biology.

[26]  Michael P. Rogers Python Tutorial , 2009 .

[27]  Giorgio Bernardi,et al.  An isochore map of human chromosomes. , 2006, Genome research.

[28]  G. Bernardi,et al.  Isochores and the Regulation of Gene Expression in the Human Genome , 2011, Genome biology and evolution.

[29]  G. Bernardi,et al.  The isochore patterns of invertebrate genomes , 2009, BMC Genomics.

[30]  Donald R. Forsdyke,et al.  Evolutionary Bioinformatics , 2016, Springer International Publishing.

[31]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[32]  G Bernardi,et al.  An approach to the organization of eukaryotic genomes at a macromolecular level. , 1976, Journal of molecular biology.

[33]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[34]  G. Bernardi,et al.  Compositional gene landscapes in vertebrates. , 2004, Genome research.

[35]  G. Bernardi,et al.  The Anolis Lizard Genome: An Amniote Genome without Isochores? , 2016, Genome biology and evolution.

[36]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[37]  G. Bernardi,et al.  Mapping Insertions, Deletions and SNPs on Venter's Chromosomes , 2009, PloS one.

[38]  G Bernardi,et al.  An analysis of eukaryotic genomes by density gradient centrifugation. , 1976, Journal of molecular biology.

[39]  G. Bernardi,et al.  Isochore pattern and gene distribution in the chicken genome. , 2007, Gene.

[40]  G Bernardi,et al.  The major components of the mouse and human genomes. 1. Preparation, basic properties and compositional heterogeneity. , 1981, European journal of biochemistry.

[41]  Paul Fearnhead,et al.  Bayesian Analysis of Isochores , 2009 .

[42]  E. Trifonov,et al.  Nucleosome DNA sequence structure of isochores , 2011, BMC Genomics.

[43]  C. Angelini,et al.  The footprint of metabolism in the organization of mammalian genomes , 2012, BMC Genomics.

[44]  D. Graur,et al.  IsoPlotter+: A Tool for Studying the Compositional Architecture of Genomes , 2013, ISRN bioinformatics.

[45]  G Bernardi,et al.  Misunderstandings about isochores. Part 1. , 2001, Gene.

[46]  Mikhail A. Roytberg,et al.  Segmentation of long genomic sequences into domains with homogeneous composition with BASIO software , 2001, Bioinform..