A New Database (GCD) on Genome Composition for Eukaryote and Prokaryote Genome Sequences and Their Initial Analyses

Eukaryote genomes contain many noncoding regions, and they are quite complex. To understand these complexities, we constructed a database, Genome Composition Database, for the whole genome composition statistics for 101 eukaryote genome data, as well as more than 1,000 prokaryote genomes. Frequencies of all possible one to ten oligonucleotides were counted for each genome, and these observed values were compared with expected values computed under observed oligonucleotide frequencies of length 1–4. Deviations from expected values were much larger for eukaryotes than prokaryotes, except for fungal genomes. Mammalian genomes showed the largest deviation among animals. The results of comparison are available online at http://esper.lab.nig.ac.jp/genome-composition-database/.

[1]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[2]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[3]  Jim Thurmond,et al.  FlyBase 101 – the basics of navigating FlyBase , 2011, Nucleic Acids Res..

[4]  Naruya Saitou,et al.  Estimation of bacterial species phylogeny through oligonucleotide frequency distances. , 2009, Genomics.

[5]  S Karlin,et al.  Genome-scale compositional comparisons in eukaryotes. , 2001, Genome research.

[6]  M. Kimura,et al.  The neutral theory of molecular evolution. , 1983, Scientific American.

[7]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[8]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[9]  G Valle Discover 1: a new program to search for unusually represented DNA motifs. , 1993, Nucleic acids research.

[10]  S Karlin,et al.  Compositional differences within and between eukaryotic genomes. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[12]  Broome,et al.  Literature cited , 1924, A Guide to the Carnivores of Central America.

[13]  Shigehiko Kanaya,et al.  Informatics for unveiling hidden genome signatures. , 2003, Genome research.

[14]  Samuel Karlin,et al.  Statistical signals in bioinformatics. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M. Kimura The Neutral Theory of Molecular Evolution: Introduction , 1983 .

[16]  Kimberly Van Auken,et al.  WormBase: a comprehensive resource for nematode research , 2009, Nucleic Acids Res..