Cooperative Metaheuristics for Exploring Proteomic Data

Most combinatorial optimization problems cannotbe solved exactly. A class of methods, calledmetaheuristics, has proved its efficiency togive good approximated solutions in areasonable time. Cooperative metaheuristics area sub-set of metaheuristics, which implies aparallel exploration of the search space byseveral entities with information exchangebetween them. The importance of informationexchange in the optimization process is relatedto the building block hypothesis ofevolutionary algorithms, which is based onthese two questions: what is the pertinentinformation of a given potential solution andhow this information can be shared? Aclassification of cooperative metaheuristicsmethods depending on the nature of cooperationinvolved is presented and the specificproperties of each class, as well as a way tocombine them, is discussed. Severalimprovements in the field of metaheuristics arealso given. In particular, a method to regulatethe use of classical genetic operators and todefine new more pertinent ones is proposed,taking advantage of a building block structuredrepresentation of the explored space. Ahierarchical approach resting on multiplelevels of cooperative metaheuristics is finallypresented, leading to the definition of acomplete concerted cooperation strategy. Someapplications of these concepts to difficultproteomics problems, including automaticprotein identification, biological motifinference and multiple sequence alignment arepresented. For each application, an innovativemethod based on the cooperation concept isgiven and compared with classical approaches.In the protein identification problem, a firstlevel of cooperation using swarm intelligenceis applied to the comparison of massspectrometric data with biological sequencedatabase, followed by a genetic programmingmethod to discover an optimal scoring function.The multiple sequence alignment problem isdecomposed in three steps involving severalevolutionary processes to infer different kindof biological motifs and a concertedcooperation strategy to build the sequencealignment according to their motif content.

[1]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[2]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[3]  Ron D. Appel,et al.  MoDEL: an efficient strategy for ungapped local multiple alignment , 2004, Comput. Biol. Chem..

[4]  Wolfgang Golubski,et al.  Genetic Programming: A Parallel Approach , 2002, Soft-Ware.

[5]  R Gras,et al.  Computational aspects of protein identification by mass spectrometry. , 2001, Current opinion in molecular therapeutics.

[6]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[7]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[8]  Desmond G. Higgins,et al.  Fast and sensitive multiple sequence alignments on a microcomputer , 1989, Comput. Appl. Biosci..

[9]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[10]  David E. Goldberg,et al.  The Design of Innovation: Lessons from and for Competent Genetic Algorithms , 2002 .

[11]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[12]  Terry Jones,et al.  A Description of Holland's Royal Road Function , 1994, Evolutionary Computation.

[13]  Hiroki Arimura,et al.  On approximation algorithms for local multiple alignment , 2000, RECOMB '00.

[14]  Lisa J. Mullan Multiple Sequence Alignment - The Gateway to Further Analysis , 2002, Briefings Bioinform..

[15]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[16]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[18]  Markus Müller,et al.  Scoring Functions for Mass Spectrometric Protein Identification , 2003 .

[19]  Andrea Califano,et al.  SPLASH: structural pattern localization analysis by sequential histograms , 2000, Bioinform..

[20]  Marco Dorigo,et al.  The ant colony optimization meta-heuristic , 1999 .

[21]  John H. Holland,et al.  When will a Genetic Algorithm Outperform Hill Climbing , 1993, NIPS.

[22]  Uri Keich,et al.  Finding motifs in the twilight zone , 2002, RECOMB '02.

[23]  William F. Punch HOW EFFECTIVE ARE MULTIPLE POPULATIONS IN GENETIC PROGRAMMING , 1998 .

[24]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[25]  Denis Hochstrasser,et al.  Functional proteomic analysis of human nucleolus. , 2002, Molecular biology of the cell.

[26]  R D Appel,et al.  Improving protein identification from peptide mass fingerprinting through a parameterized multi‐level scoring algorithm and an optimized peak detection , 1999, Electrophoresis.

[27]  D. Higgins,et al.  Finding flexible patterns in unaligned protein sequences , 1995, Protein science : a publication of the Protein Society.

[28]  D. E. Goldberg,et al.  Genetic Algorithm in Search , 1989 .

[29]  M. Dunn,et al.  Proteomics: From Protein Sequence to Function , 2001 .

[30]  Marco Tomassini,et al.  Experimental Study of Multipopulation Parallel Genetic Programming , 2000, EuroGP.

[31]  Lothar Thiele,et al.  A Comparison of Selection Schemes used in Genetic Algorithms , 1995 .

[32]  D. Higgins,et al.  Multiple sequence alignment. , 2000, Methods in molecular biology.

[33]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[34]  Chris L. Tang,et al.  Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. , 2001, Genome research.

[35]  J. Stoye Multiple sequence alignment with the Divide-and-Conquer method. , 1998, Gene.

[36]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[37]  Marc R. Wilkins,et al.  Proteome Research: New Frontiers in Functional Genomics , 1997, Principles and Practice.

[38]  Jeremy Buhler,et al.  Finding Motifs Using Random Projections , 2002, J. Comput. Biol..

[39]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[40]  W. Lehmann,et al.  Patchwork peptide sequencing: Extraction of sequence information from accurate mass data of peptide tandem mass spectra recorded at high resolution* , 2002, Proteomics.

[41]  R. Appel,et al.  Popitam: Towards new heuristic strategies to improve protein identification from tandem mass spectrometry data , 2003, Proteomics.

[42]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[43]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[44]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[45]  S A Benner,et al.  Detecting compensatory covariation signals in protein evolution using reconstructed ancestral sequences. , 2002, Journal of molecular biology.

[46]  O. Gotoh Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. , 1996, Journal of molecular biology.

[47]  Marie-France Sagot,et al.  Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification , 2000, RECOMB '00.

[48]  Daniel J Rigden,et al.  Use of covariance analysis for the prediction of structural domain boundaries from multiple protein sequence alignments. , 2002, Protein engineering.

[49]  Aaron J Mackey,et al.  Getting More from Less , 2002, Molecular & Cellular Proteomics.

[50]  Erik L L Sonnhammer,et al.  Quality assessment of multiple alignment programs , 2002, FEBS letters.

[51]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[52]  William R. Atchley,et al.  Molecular Evolution of Helix–Turn–Helix Proteins , 1999, Journal of Molecular Evolution.

[53]  Dr. Zbigniew Michalewicz,et al.  How to Solve It: Modern Heuristics , 2004 .

[54]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[55]  Erik D. Goodman,et al.  Coarse-grain parallel genetic algorithms: categorization and new approach , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[56]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[57]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[58]  Nostrand Reinhold,et al.  the utility of using the genetic algorithm approach on the problem of Davis, L. (1991), Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York. , 1991 .

[59]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[60]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[61]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[62]  Toshihide Ibaraki,et al.  On metaheuristic algorithms for combinatorial optimization problems , 2001, Systems and Computers in Japan.