CoreGenes: A computational tool for identifying and cataloging "core" genes in a set of small genomes

BackgroundImprovements in DNA sequencing technology and methodology have led to the rapid expansion of databases comprising DNA sequence, gene and genome data. Lower operational costs and heightened interest resulting from initial intriguing novel discoveries from genomics are also contributing to the accumulation of these data sets. A major challenge is to analyze and to mine data from these databases, especially whole genomes. There is a need for computational tools that look globally at genomes for data mining.ResultsCoreGenes is a global JAVA-based interactive data mining tool that identifies and catalogs a "core" set of genes from two to five small whole genomes simultaneously. CoreGenes performs hierarchical and iterative BLASTP analyses using one genome as a reference and another as a query. Subsequent query genomes are compared against each newly generated "consensus." These iterations lead to a matrix comprising related genes from this set of genomes, e. g., viruses, mitochondria and chloroplasts. Currently the software is limited to small genomes on the order of 330 kilobases or less.ConclusionA computational tool CoreGenes has been developed to analyze small whole genomes globally. BLAST score-related and putatively essential "core" gene data are displayed as a table with links to GenBank for further data on the genes of interest. This web resource is available at http://pumpkins.ib3.gmu.edu:8080/CoreGenes or http://www.bif.atcc.org/CoreGenes.

[1]  M. Hattori,et al.  Comparison of whole genome sequences of Chlamydia pneumoniae J138 from Japan and CWL029 from USA. , 2000, Nucleic acids research.

[2]  R. Durbin,et al.  Alfresco--a workbench for comparative genomic sequence analysis. , 2000, Genome research.

[3]  K. H. Wolfe,et al.  Evolution of gene order and chromosome number in Saccharomyces, Kluyveromyces and related fungi , 1998, Yeast.

[4]  C. Lemieux,et al.  The complete chloroplast DNA sequence of the green alga Nephroselmis olivacea: insights into the architecture of ancestral chloroplast genomes. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Boore,et al.  Big trees from little genomes: mitochondrial gene order as a phylogenetic tool. , 1998, Current opinion in genetics & development.

[6]  Jo Dicks,et al.  Graphical Tools for Comparative Genome Analysis , 2000, Yeast.

[7]  S. Tanksley,et al.  Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  B. Lang,et al.  A Comparative Genomics Approach to the Evolution of Eukaryotes and their Mitochondria 1 , 1999, The Journal of eukaryotic microbiology.

[9]  Florian Prill,et al.  FOUNTAIN: A JAVA open-source package to assist large sequencing projects , 2001, BMC Bioinformatics.

[10]  W. Miller,et al.  Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. , 1997, Genome research.

[11]  Webb Miller,et al.  Genome Sequence Comparisons: Hurdles in the Fast Lane to Functional Genomics , 2000, Briefings Bioinform..

[12]  Raja Mazumder,et al.  GeneOrder: comparing the order of genes in small genomes , 2001, Bioinform..

[13]  I-Min A. Dubchak,et al.  Active conservation of noncoding sequences revealed by three-way species comparisons. , 2000, Genome research.

[14]  Arvind K. Bansal,et al.  An automated comparative analysis of 17 complete microbial genomes , 1999, Bioinform..

[15]  R Mazumder,et al.  Comparisons of gene colinearity in genomes using GeneOrder2.0. , 2001, Trends in biochemical sciences.

[16]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[17]  J. Roach,et al.  Comparative genomics of the human and mouse T cell receptor loci. , 2001, Immunity.

[18]  L. Hood,et al.  Striking sequence similarity over almost 100 kilobases of human and mouse T–cell receptor DNA , 1994, Nature Genetics.

[19]  Anton J. Enright,et al.  Estimation of Synteny Conservation and Genome Compaction Between Pufferfish (Fugu) and Human , 2000, Yeast.

[20]  B. Birren,et al.  Analysis of the cat eye syndrome critical region in humans and the region of conserved synteny in mice: a search for candidate genes at or near the human chromosome 22 pericentromere. , 2001, Genome research.

[21]  L. Koski,et al.  The Closest BLAST Hit Is Often Not the Nearest Neighbor , 2001, Journal of Molecular Evolution.

[22]  B. Keller,et al.  Colinearity and gene density in grass genomes. , 2000, Trends in plant science.

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  J. Gilley,et al.  Extensive gene order differences within regions of conserved synteny between the Fugu and human genomes: implications for chromosomal evolution and the cloning of disease genes. , 1999, Human molecular genetics.

[25]  Kathryn F. Beal,et al.  The Staden package, 1998. , 2000, Methods in molecular biology.

[26]  G. Helt,et al.  BioViews: Java-based tools for genomic data visualization. , 1998, Genome research.

[27]  Webb Miller,et al.  Comparison of genomic DNA sequences: solved and unsolved problems , 2001, Bioinform..

[28]  C Upton,et al.  Viral genome organizer: a system for analyzing complete viral genomes. , 2000, Virus research.

[29]  P A Pevzner,et al.  Genome sequence comparison and scenarios for gene rearrangements: a test case. , 1995, Genomics.

[30]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.