A computational prediction of isochores based on hidden Markov models.

Mammalian genomes are organised into a mosaic of regions (in general more than 300 kb in length), with differing, relatively homogeneous G+C contents. The G+C content is the basic characteristic of isochores, but they have also been associated with many other biological properties. For instance, the genes are more compact and their density is highest in G+C rich isochores. Various ways of locating isochores in the human genome have been developed, but such methods use only the base composition of the DNA sequences. The present paper proposes a new method, based on a hidden Markov model, which takes into account several of the biological properties associated with the isochore structure of a genome. This method leads to good segmentation of the human genome into isochores, and also permits a new analysis of the known heterogeneity of G+C rich isochores: most (60%) of the G+C poor genes embedded in G+C rich isochores have UTR sequences characteristic of G+C rich genes. This genomic feature is discussed in the context of both evolution and genome function.

[1]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[2]  Isochores merit the prefix 'iso' , 2002, Comput. Biol. Chem..

[3]  Giorgio Bernardi,et al.  Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins , 1991, Journal of Molecular Evolution.

[4]  A. Nekrutenko,et al.  Assessment of compositional heterogeneity within and between eukaryotic genomes. , 2000, Genome research.

[5]  M. Gouy,et al.  HOVERGEN: a database of homologous vertebrate genes. , 1994, Nucleic acids research.

[6]  M. Borodovsky,et al.  Recognition of genes in DNA sequence with ambiguities. , 1993, Bio Systems.

[7]  G Bernardi,et al.  The major components of the mouse and human genomes. 2. Reassociation kinetics. , 1981, European journal of biochemistry.

[8]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[9]  M. Hattori,et al.  The DNA sequence of human chromosome 21 , 2000, Nature.

[10]  G Bernardi,et al.  Misunderstandings about isochores. Part 1. , 2001, Gene.

[11]  G Bernardi,et al.  The gene distribution of the human genome. , 1996, Gene.

[12]  G Bernardi,et al.  An approach to the organization of eukaryotic genomes at a macromolecular level. , 1976, Journal of molecular biology.

[13]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[14]  Ren Zhang,et al.  An isochore map of the human genome based on the Z curve method. , 2003, Gene.

[15]  Mosaic structure of the DNA molecules of the human chromosomes 21 and 22 , 2001, Molecular Biology Reports.

[16]  G Bernardi,et al.  CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. , 1998, Gene.

[17]  Michael Hackenberg,et al.  IsoFinder: computational prediction of isochores in genome sequences , 2004, Nucleic Acids Res..

[18]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[19]  Ramón Román-Roldán,et al.  Mapping isochores by entropic segmentation of long genome sequences , 2001 .

[20]  Laurence D. Hurst,et al.  The evolution of isochores , 2001, Nature Reviews Genetics.

[21]  L. Duret,et al.  Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores , 1995, Journal of Molecular Evolution.

[22]  Mikhail S. Gelfand,et al.  Segmentation of yeast DNA using hidden Markov models , 1999, Bioinform..

[23]  M. D'urso,et al.  A compositional map of human chromosome band Xq28. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Ramón Román-Roldán,et al.  Isochore chromosome maps of the human genome. , 2002, Gene.

[25]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[26]  G Bernardi,et al.  Human coding and noncoding DNA: compositional correlations. , 1996, Molecular phylogenetics and evolution.

[27]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.

[28]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[29]  G Bernardi,et al.  The distribution of genes in the human genome. , 1991, Gene.