Compositional gene landscapes in vertebrates.

The existence of a well conserved linear relationship between GC levels of genes' second and third codon positions (GC2, GC3) prompted us to focus on the landscape, or joint distribution, spanned by these two variables. In human, well curated coding sequences now cover at least 15%-30% of the estimated total gene set. Our analysis of the landscape defined by this gene set revealed not only the well documented linear crest, but also the presence of several peaks and valleys along that crest, a property that was also indicated in two other warm-blooded vertebrates represented by large gene databases, that is, mouse and chicken. GC2 is the sum of eight amino acid frequencies, whereas GC3 is linearly related to the GC level of the chromosomal region containing the gene. The landscapes therefore portray relations between proteins and the DNA environments of the genes that encode them.

[1]  M. Ross L Isochore Map: Gene‐poor Isochores , 2005 .

[2]  Giorgio Bernardi,et al.  Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins , 1991, Journal of Molecular Evolution.

[3]  G. Bernardi,et al.  The new genes of rice: a closer look. , 2004, Trends in plant science.

[4]  G. Bernardi,et al.  Incorrectly predicted genes in rice? , 2004, Gene.

[5]  Kamel Jabbari,et al.  The major shifts of human duplicated genes. , 2003, Gene.

[6]  G. Bernardi,et al.  The correlation between GC3 and hydropathy in human genes. , 2003, Gene.

[7]  Kamel Jabbari,et al.  Compositional Features of Eukaryotic Genomes for Checking Predicted Genes , 2003, Briefings Bioinform..

[8]  Giorgio Bernardi,et al.  Localization of the gene-richest and the gene-poorest isochores in the interphase nuclei of mammals and birds. , 2002, Gene.

[9]  Jan Paces,et al.  A compact view of isochores in the draft human genome sequence , 2002, FEBS letters.

[10]  Christian M. Reidys,et al.  Combinatorial Landscapes , 2002, SIAM Rev..

[11]  G Bernardi,et al.  Misunderstandings about isochores. Part 1. , 2001, Gene.

[12]  Gustavo Glusman,et al.  The complete human olfactory subgenome. , 2001, Genome research.

[13]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[14]  G Bernardi,et al.  The compositional evolution of vertebrate genomes. , 2000, Gene.

[15]  G. Bernardi,et al.  Two classes of genes in plants. , 2000, Genetics.

[16]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[17]  G Bernardi,et al.  The correlation of protein hydropathy with the base composition of coding sequences. , 1999, Gene.

[18]  G. Bernardi,et al.  Compositional Correlations in the Chicken Genome , 1999, Journal of Molecular Evolution.

[19]  G Bernardi,et al.  The gene distribution of the human genome. , 1996, Gene.

[20]  G Bernardi,et al.  Human coding and noncoding DNA: compositional correlations. , 1996, Molecular phylogenetics and evolution.

[21]  Graziano Pesole,et al.  CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases , 1996, Comput. Appl. Biosci..

[22]  David H. Douglas Least-cost Path in GIS Using an Accumulated Cost Surface and Slopelines , 1994 .

[23]  M. Adams,et al.  How many genes in the human genome? , 1994, Nature Genetics.

[24]  M. Gouy,et al.  HOVERGEN: a database of homologous vertebrate genes. , 1994, Nucleic acids research.

[25]  D. W. Scott,et al.  The Mode Tree: A Tool for Visualization of Nonparametric Density Features , 1993 .

[26]  G Bernardi,et al.  A universal compositional correlation among codon positions. , 1992, Gene.

[27]  N. Sueoka Directional mutation pressure and neutral molecular evolution. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[28]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[29]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[30]  G. Bernardi The human genome and its evolutionary context. , 1986, Cold Spring Harbor symposia on quantitative biology.

[31]  A. Suyama,et al.  Third letters in codons counterbalance the (G + C)‐content of their first and second letters , 1985 .

[32]  Marcella Attimonelli,et al.  ACNUC - a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage , 1985, Comput. Appl. Biosci..

[33]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[34]  G. Bernardi,et al.  Organization of nucleotide sequences in the chicken genome. , 1983, European journal of biochemistry.

[35]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[36]  G. Bernardi,et al.  The major components of the mouse and human genomes. 2. Reassociation kinetics. , 1981, European journal of biochemistry.

[37]  G. Bernardi,et al.  An analysis of the bovine genome by density gradient centrifugation: fractionation in Cs2SO4/3,6-bis(acetatomercurimethyl)dioxane density gradient. , 1977, European journal of biochemistry.

[38]  G Bernardi,et al.  An approach to the organization of eukaryotic genomes at a macromolecular level. , 1976, Journal of molecular biology.

[39]  P. Doty,et al.  Determination of the base composition of deoxyribonucleic acid from its buoyant density in CsCl. , 1962, Journal of molecular biology.