AltTrans: Transcript pattern variants annotated for both alternative splicing and alternative polyadenylation

BackgroundThe three major mechanisms that regulate transcript formation involve the selection of alternative sites for transcription start (TS), splicing, and polyadenylation. Currently there are efforts that collect data & annotation individually for each of these variants. It is important to take an integrated view of these data sets and to derive a data set of alternate transcripts along with consolidated annotation. We have been developing in the past computational pipelines that generate value-added data at genome-scale on individual variant types; these include AltSplice on splicing and AltPAS on polyadenylation. We now extend these pipelines and integrate the resultant data sets to facilitate an integrated view of the contributions from splicing and polyadenylation in the formation of transcript variants.DescriptionThe AltSplice pipeline examines gene-transcript alignments and delineates alternative splice events and splice patterns; this pipeline is extended as AltTrans to delineate isoform transcript patterns for each of which both introns/exons and 'terminating' polyA site are delineated; EST/mRNA sequences that qualify the transcript pattern confirm both the underlying splicing and polyadenylation. The AltPAS pipeline examines gene-transcript alignments and delineates all potential polyA sites irrespective of underlying splicing patterns. Resultant polyA sites from both AltTrans and AltPAS are merged. The generated database reports data on alternative splicing, alternative polyadenylation and the resultant alternate transcript patterns; the basal data is annotated for various biological features. The data (named as integrated AltTrans data) generated for both the organisms of human and mouse is made available through the Alternate Transcript Diversity web site at http://www.ebi.ac.uk/atd/.ConclusionThe reported data set presents alternate transcript patterns that are annotated for both alternative splicing and alternative polyadenylation. Results based on current transcriptome data indicate that the contribution of alternative splicing is larger than that of alternative polyadenylation.

[1]  Zhiping Weng,et al.  PromoSer: a large-scale mammalian promoter and transcription start site identification service , 2003, Nucleic Acids Res..

[2]  T A Thanaraj,et al.  Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. , 2002, Human molecular genetics.

[3]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[4]  M. Fagiolini,et al.  Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. , 2003, Genome research.

[5]  Kenta Nakai,et al.  BTSS, DataBase of Transcriptional Start Sites: progress report 2004 , 2004, Nucleic Acids Res..

[6]  Martin Vingron,et al.  Genome wide identification and classification of alternative splicing based on EST data , 2004, Bioinform..

[7]  E Pauws,et al.  Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences: implications for SAGE analysis. , 2001, Nucleic acids research.

[8]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[9]  J. Manley,et al.  Strange bedfellows: polyadenylation factors at the promoter. , 2003, Genes & development.

[10]  Steffen Heber,et al.  The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome. , 2004, Nucleic acids research.

[11]  Yi Xing,et al.  ASAP: the Alternative Splicing Annotation Project , 2003, Nucleic Acids Res..

[12]  Juha Muilu,et al.  Conservation of human alternative splice events in mouse. , 2003, Nucleic acids research.

[13]  W. Gish,et al.  Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. , 2001, Genome research.

[14]  K. Neugebauer,et al.  On the importance of being co-transcriptional , 2002, Journal of Cell Science.

[15]  Heike Pospisil,et al.  EASED: Extended Alternatively Spliced EST Database , 2004, Nucleic Acids Res..

[16]  Jorng-Tzong Horng,et al.  ProSplicer: a database of putative alternative splicing information derived from protein, mRNA and expressed sequence tag sequence data , 2003, Genome Biology.

[17]  Bin Tian,et al.  A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[18]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[19]  Jorng-Tzong Horng,et al.  SpliceInfo: an information repository for mRNA alternative splicing in human genome , 2004, Nucleic Acids Res..

[20]  Dixie L Mager,et al.  Complex controls: the role of alternative promoters in mammalian genomes. , 2003, Trends in genetics : TIG.

[21]  D Gautheret,et al.  Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data. , 2001, Genome research.

[22]  Richard Durbin,et al.  A probabilistic model of 3' end formation in Caenorhabditis elegans. , 2004, Nucleic acids research.

[23]  T. Marr,et al.  Computational analysis of 3'-ends of ESTs shows four classes of alternative polyadenylation in human, mouse, and rat. , 2005, Genome research.

[24]  Thangavel Alphonse Thanaraj,et al.  ASD: the Alternative Splicing Database , 2004, Nucleic Acids Res..

[25]  Thangavel Alphonse Thanaraj,et al.  ASD: a bioinformatics resource on alternative splicing , 2005, Nucleic Acids Res..

[26]  Michael Recce,et al.  PolyA_DB: a database for mammalian mRNA polyadenylation , 2004, Nucleic Acids Res..

[27]  Inna Dubchak,et al.  ASDB: database of alternatively spliced genes , 2000, Nucleic Acids Res..

[28]  T A Thanaraj,et al.  Prediction and statistical analysis of alternatively spliced exons. , 2003, Progress in molecular and subcellular biology.

[29]  Philipp Bucher,et al.  The Eukaryotic Promoter Database EPD: the impact of in silico primer extension , 2004, Nucleic Acids Res..

[30]  R. Myers,et al.  Identification and functional analysis of human transcriptional promoters. , 2003, Genome research.

[31]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[32]  A. Kornblihtt,et al.  Multiple links between transcription and splicing. , 2004, RNA.

[33]  Sanghyuk Lee,et al.  ECgene: genome annotation for alternative splicing , 2004, Nucleic Acids Res..

[34]  Dawood B. Dudekula,et al.  Genome-wide assembly and analysis of alternative transcripts in mouse. , 2005, Genome research.

[35]  J. Valcárcel,et al.  Alternative pre-mRNA splicing: the logic of combinatorial control. , 2000, Trends in biochemical sciences.

[36]  Kanako O. Koyanagi,et al.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones , 2004, PLoS Biology.

[37]  J. Cáceres,et al.  Pre-mRNA splicing: life at the centre of the central dogma , 2004, Journal of Cell Science.

[38]  Damian Smedley,et al.  Ensembl 2005 , 2004, Nucleic Acids Res..

[39]  P Cramer,et al.  Functional association between promoter structure and transcript alternative splicing. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Qing Zhou,et al.  AsMamDB: an alternative splice database of mammals , 2001, Nucleic Acids Res..

[41]  Sue Povey,et al.  Genew: the Human Gene Nomenclature Database , 2002, Nucleic Acids Res..

[42]  Terry Gaasterland,et al.  Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. , 2003, Genome research.

[43]  Inna Dubchak,et al.  ASDB: database of alternatively spliced genes , 1999, Nucleic Acids Res..

[44]  T. Maniatis,et al.  An extensive network of coupling among gene expression machines , 2002, Nature.

[45]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[46]  Y.-H. Huang,et al.  PALS db: Putative Alternative Splicing database , 2002, Nucleic Acids Res..