Rose: generating sequence families

MOTIVATION We present a new probabilistic model of the evolution of RNA-, DNA-, or protein-like sequences and a software tool, Rose, that implements this model. Guided by an evolutionary tree, a family of related sequences is created from a common ancestor sequence by insertion, deletion and substitution of characters. During this artificial evolutionary process, the 'true' history is logged and the 'correct' multiple sequence alignment is created simultaneously. The model also allows for varying rates of mutation within the sequences, making it possible to establish so-called sequence motifs. RESULTS The data created by Rose are suitable for the evaluation of methods in multiple sequence alignment computation and the prediction of phylogenetic relationships. It can also be useful when teaching courses in or developing models of sequence evolution and in the study of evolutionary processes. AVAILABILITY Rose is available on the Bielefeld Bioinformatics WebServer under the following URL: http://bibiserv.TechFak.Uni-Bielefeld.DE/rose/ The source code is available upon request. CONTACT folker@TechFak.Uni-Bielefeld.DE

[1]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[3]  Jens Stoye,et al.  DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment , 1997, Comput. Appl. Biosci..

[4]  G. Schulz,et al.  Structural relationships in the adenylate kinase family. , 1986, European journal of biochemistry.

[5]  S A Benner,et al.  Amino acid substitution during functionally constrained divergent evolution of protein sequences. , 1994, Protein engineering.

[6]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[7]  M. Schoniger,et al.  Simulating efficiently the evolution of DNA sequences , 1995, Comput. Appl. Biosci..

[8]  J. Greer Comparative modeling methods: Application to the family of the mammalian serine proteases , 1990, Proteins.

[9]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[10]  J. Greer Comparative model-building of the mammalian serine proteases. , 1981, Journal of molecular biology.

[11]  A. von Haeseler,et al.  Simulating efficiently the evolution of DNA sequences. , 1995, Computer applications in the biosciences : CABIOS.

[12]  Pankaj Agarwal,et al.  A Bayesian Evolutionary Distance for Parametrically Aligned Sequences , 1996, J. Comput. Biol..

[13]  S. Jeffery Evolution of Protein Molecules , 1979 .

[14]  Douglas L. Brutlag,et al.  Identification of Protein Motifs Using Conserved Amino Acid Properties and Partitioning Techniques , 1995, ISMB.

[15]  Robert Giegerich,et al.  GeneFisher-Software Support for the Detection of Postulated Genes , 1996, ISMB.