Importing statistical measures into Artemis enhances gene identification in the Leishmania genome project

BackgroundSeattle Biomedical Research Institute (SBRI) as part of the Leishmania Genome Network (LGN) is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces.ResultsHere we describe a methodology used to import results from three different protein-coding gene-prediction algorithms (GLIMMER, TESTCODE and GENESCAN) into the ARTEMIS sequence viewer and annotation tool. Comparison of these methods, along with the CODON USAGE algorithm built into ARTEMIS, shows the importance of combining methods to more accurately annotate the L. major genomic sequence.ConclusionAn improvised and powerful tool for gene prediction has been developed by importing data from widely-used algorithms into an existing annotation platform. This approach is especially fruitful in the Leishmania genome project where there is large proportion of novel genes requiring manual annotation.

[1]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[2]  A. D. McLachlan,et al.  Codon preference and its use in identifying protein coding regions in long DNA sequences , 1982, Nucleic Acids Res..

[3]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[4]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.

[5]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[6]  Vladimir Pavlovic,et al.  A Bayesian framework for combining gene predictions , 2002, Bioinform..

[7]  R. Durbin,et al.  GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. , 2002, Genome research.

[8]  James W. Fickett,et al.  The Gene Identification Problem: An Overview for Developers , 1995, Comput. Chem..

[9]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[10]  Toshihisa Takagi,et al.  DIGIT: A Novel Gene Finding Program by Combining Gene-Finders , 2002, Pacific Symposium on Biocomputing.

[11]  Roderic Guigó,et al.  DNA Composition, Codon Usage and Exon Prediction , 1997 .

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  G. Stormo Gene-finding approaches for eukaryotes. , 2000, Genome research.

[14]  J K Field,et al.  A comparative guide to gene prediction tools for the bioinformatics amateur. , 2002, International journal of oncology.

[15]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[16]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[17]  Reza Salavati,et al.  Leishmania major chromosome 3 contains two long convergent polycistronic gene clusters separated by a tRNA gene. , 2003, Nucleic acids research.

[18]  S. Sunkin,et al.  Leishmania major Friedlin chromosome 1 has an unusual distribution of protein-coding genes. , 1999, Proceedings of the National Academy of Sciences of the United States of America.