Toward efficient multiple molecular sequence alignment: a system of genetic algorithm and dynamic programming

Multiple biomolecular sequence alignment is among the most important and challenging tasks in computational biology. It is characterized by great complexity in processing time. In this paper, a multiple-sequence alignment system is reported which combines the techniques of genetic algorithms and pairwise dynamic programming. Genetic algorithms are stochastic approaches for efficient and robust search. By converting biomolecular sequence alignment into a problem of searching for an optimal or a near-optimal point in a solution space, a genetic algorithm is used to find match blocks very efficiently. A pairwise dynamic programming is then applied to the subsequences between the match blocks. Combining the strengths of the two methods, the system achieves high efficiency and high alignment quality. In this paper, the system is described in detail. The system's performance is analyzed and the experimental results are presented.

[1]  Mikhail A. Roytberg A search for common patterns in many sequences , 1992, Comput. Appl. Biosci..

[2]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[3]  H. T. Laquer,et al.  Asymptotic Limits for a Two-Dimensional Recursion , 1981 .

[4]  C. Pleij,et al.  An APL-programmed genetic algorithm for the prediction of RNA secondary structure. , 1995, Journal of theoretical biology.

[5]  D. E. Goldberg,et al.  Simple Genetic Algorithms and the Minimal, Deceptive Problem , 1987 .

[6]  John J. Grefenstette,et al.  Genetic Search with Approximate Function Evaluation , 1985, ICGA.

[7]  Frederick E. Petry,et al.  Schema survival rates and heuristic search in genetic algorithms , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[8]  C. Pleij,et al.  The influence of a metastable structure in plasmid primer RNA on antisense RNA binding kinetics. , 1995, Nucleic acids research.

[9]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  O. Gotoh Consistency of optimal sequence alignments. , 1990, Bulletin of Mathematical Biology.

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  P. Argos,et al.  Potential of genetic algorithms in protein folding and protein engineering simulations. , 1992, Protein engineering.

[13]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[14]  Olli Nevalainen,et al.  MULTICOMP: a program package for multiple sequence comparison , 1992, Comput. Appl. Biosci..

[15]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[16]  Kenneth A. De Jong,et al.  Using genetic algorithms for supervised concept learning , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[17]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[18]  Carol A. Ankenbrandt An Extension to the Theory of Convergence and a Proof of the Time Complexity of Genetic Algorithms , 1990, FOGA.

[19]  Webb Miller Building multiple alignments from pairwise alignments , 1993, Comput. Appl. Biosci..

[20]  Jerrold R. Griggs,et al.  On the number of alignments ofk sequences , 1990, Graphs Comb..

[21]  D. K. Y. Chiu,et al.  A survey of multiple sequence comparison methods , 1992 .

[22]  K. De Jong Learning with Genetic Algorithms: An Overview , 1988 .

[23]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[24]  P. Argos,et al.  Motif recognition and alignment for many sequences by comparison of dot-matrices. , 1991, Journal of molecular biology.

[25]  M S Waterman,et al.  Multiple sequence alignment by consensus. , 1986, Nucleic acids research.

[26]  J. Lakey,et al.  The bacterial porin superfamily: sequence alignment and structure prediction , 1991, Molecular microbiology.

[27]  David E. Goldberg,et al.  Sizing Populations for Serial and Parallel Genetic Algorithms , 1989, ICGA.

[28]  Michael S. Waterman,et al.  General methods of sequence comparison , 1984 .

[29]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[30]  John H. Holland Genetic Algorithms and Classifier Systems: Foundations and Future Directions , 1987, ICGA.

[31]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[32]  A. Apostolio,et al.  A Fast Linear Space Algorithm for Computing Longest Common Subsequences , 1985 .

[33]  D. Gusfield Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993 .

[34]  David Sankoff,et al.  A strategy for sequence phylogeny research , 1982, Nucleic Acids Res..