Generating Benchmarks for Multiple Sequence Alignments and Phylogenic Reconstructions

We present a new probabilistic model of evolution of RNA-, DNA-, or protein-like sequences and a tool rose that implements this model. By insertion, deletion and substitution of characters, a family of sequences is created from a common ancestor. During this artificial evolutionary process, the "true" history is logged and the "correct" multiple sequence alignment is created simultaneously. We also allow for varying rates of mutation within the sequences making it possible to establish so-called sequence motifs. The results are suitable for the evaluation of methods in multiple sequence alignment computation and the prediction of phylogenetic relationships.

[1]  S A Benner,et al.  Amino acid substitution during functionally constrained divergent evolution of protein sequences. , 1994, Protein engineering.

[2]  H. Munro,et al.  Mammalian protein metabolism. Volume 4. , 1964 .

[3]  J. Greer Comparative modeling methods: Application to the family of the mammalian serine proteases , 1990, Proteins.

[4]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[6]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[7]  Pankaj Agarwal,et al.  A Bayesian Evolutionary Distance for Parametrically Aligned Sequences , 1996, J. Comput. Biol..

[8]  J Stoye,et al.  A general method for fast multiple sequence alignment. , 1996, Gene.

[9]  Robert Giegerich,et al.  GeneFisher-Software Support for the Detection of Postulated Genes , 1996, ISMB.

[10]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[11]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[12]  M. Schoniger,et al.  Simulating efficiently the evolution of DNA sequences , 1995, Comput. Appl. Biosci..

[13]  Douglas L. Brutlag,et al.  Identification of Protein Motifs Using Conserved Amino Acid Properties and Partitioning Techniques , 1995, ISMB.

[14]  A Gajdos,et al.  [Evolution of protein molecules. I. Protein synthesis]. , 1972, La Nouvelle presse medicale.

[15]  G. Schulz,et al.  Structural relationships in the adenylate kinase family. , 1986, European journal of biochemistry.

[16]  J. Greer Comparative model-building of the mammalian serine proteases. , 1981, Journal of molecular biology.

[17]  Jens Stoye,et al.  Divide-and-Conquer Multiple Sequence Alignment , 1997 .