Comparative Gene Prediction Based on Gene Structure Conservation

Identifying protein coding genes is one of most important task in newly sequenced genomes. With increasing numbers of gene annotations verified by experiments, it is feasible to identify genes in newly sequenced genomes by comparing with genes annotated on phylogenetically close organisms. Here, we propose a program, GeneAlign, which predicts the genes on one sequence by measuring the similarity between the predicted sequence and related genes annotated on another genome. The program applies CORAL, a heuristic linear time alignment tool, to determine whether the regions flanked by candidate signals are similar with the annotated exons or not. The approach, which employs the conservation of gene structures and sequence homologies between protein coding regions, increases the prediction accuracy. GeneAlign was tested on Projector data set of 449 human-mouse homologous sequence pairs. At the gene level, the sensitivity and specificity of GeneAlign are 80%, and larger than 96% at the exon level.

[1]  Chuan Yi Tang,et al.  Comparative exon prediction based on heuristic coding region alignment , 2005, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05).

[2]  Wei Zhu,et al.  Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus , 2004, Bioinform..

[3]  Steven Salzberg,et al.  JIGSAW: integration of multiple sources of evidence for gene prediction , 2005, Bioinform..

[4]  Lynda B. M. Ellis,et al.  Comparison of computational methods for identifying translation initiation sites in EST data , 2004, BMC Bioinformatics.

[5]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD): a monitor of genome projects world-wide , 2001, Nucleic Acids Res..

[6]  Richard Durbin,et al.  Comparative ab initio prediction of gene structures using pair HMMs , 2002, Bioinform..

[7]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[8]  P. Pevzner,et al.  Gene recognition via spliced sequence alignment. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[9]  W. J. Kent,et al.  Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. , 2000, Genome research.

[10]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[11]  M. Brent,et al.  Recent advances in gene structure prediction. , 2004, Current opinion in structural biology.

[12]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[13]  Daniel G. Brown,et al.  ExonHunter: a comprehensive approach to gene finding , 2005, ISMB.

[14]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[15]  Irmtraud M. Meyer,et al.  Gene structure conservation aids similarity based gene prediction. , 2004, Nucleic acids research.

[16]  R. Guigó,et al.  Comparative gene prediction in human and mouse. , 2003, Genome research.

[17]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[18]  Anders Gorm Pedersen,et al.  Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome Analysis , 1997, ISMB.

[19]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[20]  Jonathan E. Allen,et al.  Computational gene prediction using multiple sources of evidence. , 2003, Genome research.

[21]  L. Pachter,et al.  SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. , 2003, Genome research.

[22]  Chung-Chin Lu,et al.  Prediction of splice sites with dependency graphs and their expanded bayesian networks , 2005, Bioinform..

[23]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[24]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[25]  T. Andrews,et al.  The Ensembl automatic gene annotation system. , 2004, Genome research.

[26]  Mikhail S. Gelfand,et al.  Gene recognition in eukaryotic DNA by comparison of genomic sequences , 2001, Bioinform..

[27]  B. Berger,et al.  Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction , 2000 .

[28]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[29]  D. Church,et al.  Spidey: a tool for mRNA-to-genomic alignments. , 2001, Genome research.