Algorithms in Bioinformatics

In the last few years, it has become routine to use gene-order data to reconstruct phylogenies, both in terms of edge distances (parsimonious sequences of operations that transform one end point of the edge into the other) and in terms of genomes at internal nodes, on small, duplication-free genomes. Current gene-order methods break down, though, when the genomes contain more than a few hundred genes, possess high copy numbers of duplicated genes, or create edge lengths in the tree of over one hundred operations. We have constructed a series of heuristics that allow us to overcome these obstacles and reconstruct edges distances and genomes at internal nodes for groups of larger, more complex genomes. We present results from the analysis of a group of thirteen modern γproteobacteria, as well as from simulated datasets.

[1]  P Bork,et al.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms , 2000, FEBS letters.

[2]  Steven Salzberg,et al.  A method for identifying splice sites and translational start sites in eukaryotic mRNA , 1997, Comput. Appl. Biosci..

[3]  Hiroki Sakai,et al.  Extensive Search for Discriminative Features of Alternative Splicing , 2004, Pacific Symposium on Biocomputing.

[4]  de Ng Dick Bruijn A combinatorial problem , 1946 .

[5]  Damian Smedley,et al.  Ensembl 2004 , 2004, Nucleic Acids Res..

[6]  Victor V. Solovyev,et al.  SpliceDB: database of canonical and non-canonical mammalian splice sites , 2001, Nucleic Acids Res..

[7]  Michael Gribskov,et al.  A Database Designed to Computationally Aid an Experimental Approach to Alternative Splicing , 2003, Pacific Symposium on Biocomputing.

[8]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[9]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[10]  Tandy J. Warnow,et al.  Tree compatibility and inferring evolutionary history , 1994, SODA '93.

[11]  Louxin Zhang,et al.  On counting tandem duplication trees. , 2004, Molecular biology and evolution.

[12]  Shibu Yooseph,et al.  Zinc finger gene clusters and tandem gene duplication , 2001, J. Comput. Biol..

[13]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[14]  Benno Schwikowski,et al.  Algorithms for Phylogenetic Footprinting , 2002, J. Comput. Biol..

[15]  Mathieu Blanchette,et al.  FootPrinter: a program designed for phylogenetic footprinting , 2003, Nucleic Acids Res..

[16]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[17]  Tetsuo Nishikawa,et al.  Assessing protein coding region integrity in cDNA sequencing projects , 1998, Bioinform..

[18]  Kevin Burrage,et al.  ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome , 2000, Nature Genetics.

[19]  R. Sorek,et al.  Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. , 2003, Genome research.

[20]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[21]  Yi Xing,et al.  ASAP: the Alternative Splicing Annotation Project , 2003, Nucleic Acids Res..

[22]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[23]  M. Goodman,et al.  Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. , 1988, Journal of molecular biology.

[24]  I-Min A. Dubchak,et al.  Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. , 2001, Nucleic acids research.

[25]  Inna Dubchak,et al.  ASDB: database of alternatively spliced genes , 1999, Nucleic Acids Res..

[26]  Qing Zhou,et al.  AsMamDB: an alternative splice database of mammals , 2001, Nucleic Acids Res..