Integrating alternative splicing detection into gene prediction

BackgroundAlternative splicing (AS) is now considered as a major actor in transcriptome/proteome diversity and it cannot be neglected in the annotation process of a new genome. Despite considerable progresses in term of accuracy in computational gene prediction, the ability to reliably predict AS variants when there is local experimental evidence of it remains an open challenge for gene finders.ResultsWe have used a new integrative approach that allows to incorporate AS detection into ab initio gene prediction. This method relies on the analysis of genomically aligned transcript sequences (ESTs and/or cDNAs), and has been implemented in the dynamic programming algorithm of the graph-based gene finder EuGÈNE. Given a genomic sequence and a set of aligned transcripts, this new version identifies the set of transcripts carrying evidence of alternative splicing events, and provides, in addition to the classical optimal gene prediction, alternative optimal predictions (among those which are consistent with the AS events detected). This allows for multiple annotations of a single gene in a way such that each predicted variant is supported by a transcript evidence (but not necessarily with a full-length coverage).ConclusionsThis automatic combination of experimental data analysis and ab initio gene finding offers an ideal integration of alternatively spliced gene prediction inside a single annotation pipeline.

[1]  L. Pachter,et al.  SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. , 2003, Genome research.

[2]  Ramana V. Davuluri,et al.  Evaluation of gene prediction software using a genomic data set: application to <$O_SSF>Arabidopsis thaliana<$C_SSF>sequences , 1999, Bioinform..

[3]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[4]  David States,et al.  Selecting for functional alternative splices in ESTs. , 2002, Genome research.

[5]  Paola Bonizzoni,et al.  A Method to Detect Gene Structure and Alternative Splice Sites by Agreeing ESTs to a Genomic Sequence , 2003, WABI.

[6]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[7]  V. Brendel,et al.  Refined Annotation of the Arabidopsis Genome by Complete Expressed Sequence Tag Mapping1 , 2003, Plant Physiology.

[8]  Thomas Schiex,et al.  EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences , 2003, Nucleic Acids Res..

[9]  Christopher J. Lee,et al.  A genomic view of alternative splicing , 2002, Nature Genetics.

[10]  W. Gish,et al.  Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. , 2001, Genome research.

[11]  Jorng-Tzong Horng,et al.  ProSplicer: a database of putative alternative splicing information derived from protein, mRNA and expressed sequence tag sequence data , 2003, Genome Biology.

[12]  Eduardo Eyras,et al.  ESTGenes: alternative splicing from ESTs in Ensembl. , 2004, Genome research.

[13]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[14]  Qunfeng Dong,et al.  PlantGDB, plant genome database and analysis tools , 2004, Nucleic Acids Res..

[15]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[16]  Wei Zhu,et al.  Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus , 2004, Bioinform..

[17]  Christopher J. Lee,et al.  Genome-wide detection of alternative splicing in expressed sequences of human genes , 2001, Nucleic Acids Res..

[18]  Wei Zhu,et al.  Optimal spliced alignment of homologous cDNA to a genomic DNA template , 2000, Bioinform..

[19]  R. Bellman Dynamic programming. , 1957, Science.

[20]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[21]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[22]  Edward C. Uberbacher,et al.  Automated Gene Identification in Large-Scale Genomic Sequences , 1997, J. Comput. Biol..

[23]  Heike Pospisil,et al.  EASED: Extended Alternatively Spliced EST Database , 2004, Nucleic Acids Res..

[24]  A. Krogh,et al.  Using database matches with for HMMGene for automated gene detection in Drosophila. , 2000, Genome research.

[25]  T. Andrews,et al.  The Ensembl automatic gene annotation system. , 2004, Genome research.

[26]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[27]  Thangavel Alphonse Thanaraj,et al.  ASD: the Alternative Splicing Database , 2004, Nucleic Acids Res..

[28]  Simon Cawley,et al.  HMM sampling and applications to gene finding and alternative splicing , 2003, ECCB.

[29]  Inna Dubchak,et al.  ASDB: database of alternatively spliced genes , 2000, Nucleic Acids Res..

[30]  Yi Xing,et al.  ASAP: the Alternative Splicing Annotation Project , 2003, Nucleic Acids Res..