Genome-Wide Detection of Alternative Splicing in Expressed Sequences Using Partial Order Multiple Sequence Alignment Graphs

We present a method for high-throughput alternative splicing detection in expressed sequence data. This method effectively copes with many of the problems inherent in making inferences about splicing and alternative splicing on the basis of EST sequences, which in addition to being fragmentary and full of sequencing errors, may also be chimeric, misoriented, or contaminated with genomic sequence. Our method, which relies both on the Partial Order Alignment (POA) program for constructing multiple sequence alignments, and its Heaviest Bundling function for generating consensus sequences, accounts for the real complexity of expressed sequence data by building and analyzing a single multiple sequence alignment containing all of the expressed sequences in a particular cluster aligned to genomic sequence. We illustrate application of this method to human UniGene Cluster Hs.1162, which contains expressed sequences from the human HLA-DMB gene. We have used this method to generate databases, published elsewhere, of splices and alternative splicing relationships for the human, mouse and rat genomes. We present statistics from these calculations, as well as the CPU time for running our method on expressed sequence clusters of varying size, to verify that it truly scales to complete genomes.

[1]  Christopher J. Lee,et al.  Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss , 2003, Nature Genetics.

[2]  Christopher J. Lee,et al.  Genome-wide detection of alternative splicing in expressed sequences of human genes , 2001, Nucleic Acids Res..

[3]  D B Davison,et al.  Alternative gene form discovery and candidate gene selection from gene indexing projects. , 1998, Genome research.

[4]  P. Bork,et al.  Alternative splicing and genome complexity , 2002, Nature Genetics.

[5]  John Quackenbush,et al.  Gene Index analysis of the human genome estimates approximately 120,000 genes , 2000, Nature Genetics.

[6]  M. Gelfand,et al.  Frequent alternative splicing of human genes. , 1999, Genome research.

[7]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[8]  W. Gilbert Why genes in pieces? , 1978, Nature.

[9]  Yi Xing,et al.  The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. , 2004, Genome research.

[10]  Christopher J. Lee,et al.  Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences , 2000, Nature Genetics.

[11]  Christopher J. Lee,et al.  A genomic view of alternative splicing , 2002, Nature Genetics.

[12]  P Bork,et al.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms , 2000, FEBS letters.

[13]  David States,et al.  Selecting for functional alternative splices in ESTs. , 2002, Genome research.

[14]  W. Gish,et al.  Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. , 2001, Genome research.

[15]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[16]  Haixu Tang,et al.  Splicing graphs and EST assembly problem , 2002, ISMB.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  Kevin Burrage,et al.  ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome , 2000, Nature Genetics.

[19]  Bosiljka Tasic,et al.  Alternative pre-mRNA splicing and proteome expansion in metazoans , 2002, Nature.

[20]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[21]  Christopher J. Lee Generating Consensus Sequences from Partial Order Multiple Sequence Alignment Graphs , 2003, Bioinform..

[22]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[23]  G. Schuler Pieces of the puzzle: expressed sequence tags and the catalog of human genes , 1997, Journal of Molecular Medicine.