Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM, GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.

[1]  Wei Zheng,et al.  Finding Signals for Plant Promoters , 2003, Genomics, proteomics & bioinformatics.

[2]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[3]  V. Brendel,et al.  Logitlinear models for the prediction of splice sites in plant pre-mRNA sequences. , 1996, Nucleic acids research.

[4]  Yujun Zhang,et al.  Sequence and analysis of rice chromosome 4 , 2002, Nature.

[5]  Jonathan E. Allen,et al.  Computational gene prediction using multiple sources of evidence. , 2003, Genome research.

[6]  Cari Soderlund,et al.  In-Depth View of Structure, Activity, and Evolution of Rice Chromosome 10 , 2003, Science.

[7]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[8]  G. Bernardi,et al.  The new genes of rice: a closer look. , 2004, Trends in plant science.

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  J Quackenbush,et al.  Rice bioinformatics. analysis of rice sequence data and leveraging the data to other plant species. , 2001, Plant physiology.

[11]  Zheng Wei-Mou Genomic signal search by dynamic programming , 2003 .

[12]  Michael Q. Zhang Computational prediction of eukaryotic protein-coding genes , 2002, Nature Reviews Genetics.

[13]  R. Guigó,et al.  An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[14]  S. Salzberg,et al.  Computational gene finding in plants , 2004, Plant Molecular Biology.

[15]  T. Gojobori,et al.  The genome sequence and structure of rice chromosome 1 , 2002, Nature.

[16]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[17]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[18]  R. Wing,et al.  The Rice Chromosome 10 Sequencing Consortium. In-Depth View of Structure, Activity, and Evolution of Rice Chromosome 10 , 2002 .

[19]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[20]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[21]  Jian Wang,et al.  BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics , 2004, Nucleic Acids Res..

[22]  Zheng Wei-Mou Genomic Signal Enhancement by Clustering , 2003 .

[23]  Alan K. Mackworth,et al.  GeneComber: Combining Outputs of Gene Prediction Programs for Improved Results , 2003, Bioinform..

[24]  S. Salzberg,et al.  Interpolated Markov models for eukaryotic gene finding. , 1999, Genomics.

[25]  M. Brent,et al.  Recent advances in gene structure prediction. , 2004, Current opinion in structural biology.

[26]  Peng Jian-Hua,et al.  Chaotic Behavior of a Model of the Hypothalamo-Pituitary-Gonad Axis in Human Male , 2003 .

[27]  Roderic Guigó,et al.  Gff2ps: Visualizing Genomic Annotations , 2000, Bioinform..

[28]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[29]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[30]  M. Cotton,et al.  Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana , 1999, Nature.

[31]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[32]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.