Identification of true EST alignments and exon regions of gene sequences

Expressed sequence tags (ESTs), which have piled up considerably so far, provide a valuable resource for finding new genes, disease-relevant genes, and for recognizing alternative splicing variants, SNP sites, etc. The prerequisite for carrying out these researches is to correctly ascertain the gene-sequence-related ESTs. Based on analysis of the alignment results between some known gene sequences and ESTs in public database, several measures including Identity Check, Gap Check, Inclusion Check and Length Check have been introduced to judge whether an EST alignment is related to a gene sequence or not. A computational program EDSAcl.O has been developed to identify true EST alignments and exon regions of query gene sequences. When tested with human gene sequences in the standard dataset HMR195 and evaluated with the standard measures of gene prediction performance, EDSAcl.O can identify proteincoding regions with specificity of 0.997 and sensitivity of 0.88 at the nucleotide level, which outperform that of the counterpart TAP. A web server of EDSAcl.0 is available at http://infosci.hust.edu.cn.

[1]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[2]  Michael N. Edmonson,et al.  Reliable identification of large numbers of candidate SNPs from public EST data , 1999, Nature Genetics.

[3]  Steven A. Williams,et al.  Identification of Potential Vaccine and Drug Target Candidates by Expressed Sequence Tag Analysis and Immunoscreening of Onchocerca volvulus Larval cDNA Libraries , 2000, Infection and Immunity.

[4]  A. Krogh,et al.  Using database matches with for HMMGene for automated gene detection in Drosophila. , 2000, Genome research.

[5]  D. Haussler,et al.  Genie--gene finding in Drosophila melanogaster. , 2000, Genome research.

[6]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[7]  A. V. Lobashev,et al.  In silico screening for tumour‐specific expressed sequences in human genome , 2001, FEBS letters.

[8]  P Bork,et al.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms , 2000, FEBS letters.

[9]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[10]  W. Gish,et al.  Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. , 2001, Genome research.

[11]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[12]  R. Sorek,et al.  A novel algorithm for computational identification of contaminated EST libraries. , 2003, Nucleic acids research.

[13]  Eduardo Eyras,et al.  ESTGenes: alternative splicing from ESTs in Ensembl. , 2004, Genome research.

[14]  Honghui Wan,et al.  Prediction of eukaryotic gene structures based on multilevel optimization , 2004 .

[15]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[16]  M. Gelfand,et al.  Frequent alternative splicing of human genes. , 1999, Genome research.

[17]  Yixue Li,et al.  Identification of alternatively spliced mRNA variants related to cancers by genome-wide ESTs alignment , 2004, Oncogene.

[18]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[19]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[20]  M. Soares,et al.  Normalization and subtraction: two approaches to facilitate gene discovery. , 1996, Genome research.

[21]  M S Boguski,et al.  Gene discovery in dbEST. , 1994, Science.

[22]  Christopher J. Lee,et al.  Discovery of novel splice forms and functional analysis of cancer-specific alternative splicing in human expressed sequences. , 2003, Nucleic acids research.

[23]  S. Rudd Expressed sequence tags: alternative or complement to whole genome sequences? , 2003, Trends in plant science.

[24]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[25]  M. Ashburner,et al.  A biologist's view of the Drosophila genome annotation assessment project. , 2000, Genome research.

[26]  Renee D. White,et al.  Identification of candidate disease genes by EST alignments, synteny, and expression and verification of Ensembl genes on rat chromosome 1q43-54. , 2004, Genome research.

[27]  J. Claverie,et al.  Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. , 1999, Genome research.

[28]  Michael Q. Zhang Computational prediction of eukaryotic protein-coding genes , 2002, Nature Reviews Genetics.

[29]  H. Jacob,et al.  EbEST: an automated tool using expressed sequence tags to delineate gene structure. , 1998, Genome research.