Gene finding in novel genomes

BackgroundComputational gene prediction continues to be an important problem, especially for genomes with little experimental data.ResultsI introduce the SNAP gene finder which has been designed to be easily adaptable to a variety of genomes. In novel genomes without an appropriate gene finder, I demonstrate that employing a foreign gene finder can produce highly inaccurate results, and that the most compatible parameters may not come from the nearest phylogenetic neighbor. I find that foreign gene finders are more usefully employed to bootstrap parameter estimation and that the resulting parameters can be highly accurate.ConclusionSince gene prediction is sensitive to species-specific parameters, every genome needs a dedicated gene finder.

[1]  Alexey S Kondrashov,et al.  Analysis of similarity within 142 pairs of orthologous intergenic regions of Caenorhabditis elegans and Caenorhabditis briggsae. , 2002, Nucleic acids research.

[2]  D. R. Thomsen,et al.  Cloning of the human cytomegalovirus genome as endonuclease XbaI fragments. , 1981, Gene.

[3]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[4]  Mario Stanke,et al.  Gene prediction with a hidden Markov model and a new intron submodel , 2003, ECCB.

[5]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[6]  Victor V. Solovyev,et al.  The Gene-Finder Computer Tools for Analysis of Human and Model Organisms Genome Sequences , 1997, ISMB.

[7]  Jian Wang,et al.  An analysis of gene-finding programs for Neurospora crassa , 2001, Bioinform..

[8]  Steven Salzberg,et al.  GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders , 2003, Nucleic Acids Res..

[9]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[10]  H. Akashi,et al.  Gene expression and molecular evolution. , 2001, Current opinion in genetics & development.

[11]  S. Cawley,et al.  Phat--a gene finding program for Plasmodium falciparum. , 2001, Molecular and biochemical parasitology.

[12]  Anders Krogh,et al.  Two Methods for Improving Performance of a HMM and their Application for Gene Finding , 1997, ISMB.

[13]  Takashi Matsumoto,et al.  RiceGAAS: an automated annotation system and database for rice genome sequence , 2002, Nucleic Acids Res..

[14]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[15]  H. Lehrach,et al.  Sequence analysis of an amphioxus cosmid containing a gene homologous to members of the aldo-keto reductase gene superfamily. , 1999, Gene.

[16]  Ian Korf,et al.  MaskerAid : a performance enhancement to RepeatMasker , 2000, Bioinform..

[17]  C. Burge,et al.  A computational analysis of sequence features involved in recognition of short introns , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[19]  P. Hegde,et al.  The Institute for Genomic Research , 1998, Current Biology.

[20]  R. Guigó,et al.  GeneID in Drosophila. , 2000, Genome research.

[21]  Sydney Brenner,et al.  Comparative analysis of the PCOLCE region in Fugu rubripes using a new automated annotation tool , 2000, Mammalian Genome.