A Novel Multi-objectivisation Approach for Optimising the Protein Inverse Folding Problem

In biology, the subject of protein structure prediction is of continued interest, not only to chart the molecular map of the living cell, but also to design proteins of new functions. The Inverse Folding Problem (IFP) is in itself an important research problem, but also at the heart of most rational protein design approaches. In brief, the IFP consists in finding sequences that will fold into a given structure, rather than determining the structure for a given sequence - as in conventional structure prediction. In this work we present a Multi Objective Genetic Algorithm (MOGA) using the diversity-as-objective (DAO) variant of multi-objectivisation, to optimise secondary structure similarity and sequence diversity at the same time, hence pushing the search farther into wide-spread areas of the sequence solution-space. To control the high diversity generated by the DAO approach, we add a novel Quantile Constraint (QC) mechanism to discard an adjustable worst quantile of the population. This DAO-QC approach can efficiently emphasise exploitation rather than exploration to a selectable degree achieving a trade-off producing both better and more diverse sequences than the standard Genetic Algorithm (GA). To validate the final results, a subset of the best sequences was selected for tertiary structure prediction. The super-positioning with the original protein structure demonstrated that meaningful sequences are generated underlining the potential of this work.

[1]  H. Shimodaira DCGA: a diversity control oriented genetic algorithm , 1997 .

[2]  Pascal Bouvry,et al.  Cooperative Selection: Improving Tournament Selection via Altruism , 2014, EvoCOP.

[3]  Pascal Bouvry,et al.  Management of an academic HPC cluster: The UL experience , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[4]  John L. Klepeis,et al.  Design of peptide analogues with improved activity using a novel de novo protein design approach , 2004 .

[5]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[6]  Florian Klein,et al.  Antibodies in HIV-1 Vaccine Development and Therapy , 2013, Science.

[7]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[8]  Kenneth Alan De Jong,et al.  An analysis of the behavior of a class of genetic adaptive systems. , 1975 .

[9]  Yang Zhang,et al.  An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis , 2013, PLoS Comput. Biol..

[10]  Christodoulos A Floudas,et al.  Protein WISDOM: a workbench for in silico de novo design of biomolecules. , 2013, Journal of visualized experiments : JoVE.

[11]  Kalyanmoy Deb,et al.  Finding multiple solutions for multimodal optimization problems using a multi-objective evolutionary approach , 2010, GECCO '10.

[12]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[13]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[14]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[15]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[16]  B. Gutte,et al.  Design, synthesis and characterisation of a 34-residue polypeptide that interacts with nucleic acids , 1979, Nature.

[17]  C. Pabo Molecular technology: Designing proteins and peptides , 1983, Nature.

[18]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[19]  Ernesto Benini,et al.  Genetic Diversity as an Objective in Multi-Objective Evolutionary Algorithms , 2003, Evolutionary Computation.

[20]  David T. Jones,et al.  De novo protein design using pairwise potentials and a genetic algorithm , 1994, Protein science : a publication of the Protein Society.

[21]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[22]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[23]  Enrique Alba,et al.  The exploration/exploitation tradeoff in dynamic cellular genetic algorithms , 2005, IEEE Transactions on Evolutionary Computation.

[24]  Günter Rudolph,et al.  Niching by multiobjectivization with neighbor information: Trade-offs and benefits , 2013, 2013 IEEE Congress on Evolutionary Computation.