Improved protein complex prediction with AlphaFold-multimer by denoising the MSA profile

Structure prediction of protein complexes has improved significantly with AlphaFold2 and AlphaFold-multimer (AFM), but only 60% of dimers are accurately predicted. A way to improve the predictions is to inject noise to generate more diverse predictions. However, thousands of predictions are needed to obtain a few that are accurate in difficult cases. Here, we learn a bias to the MSA representation that improves the predictions by performing gradient descent through the AFM network. We effectively denoise the MSA profile, similar to how a blurry image would be sharpened. We demonstrate the performance on seven difficult targets from CASP15 and increase the average MMscore to 0.76 compared to 0.63 with AFM. We evaluate the procedure on 334 protein complexes where AFM fails and demonstrate an increased success rate (MMscore>0.75) of 8% on these hard targets. Our protocol, AFProfile, provides a way to direct predictions towards a defined target function guided by the MSA. We expect gradient descent over the MSA to be useful for different tasks, such as generating alternative conformations. AFProfile is freely available at: https://github.com/patrickbryant1/AFProfile

[1]  A. Elofsson,et al.  Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes , 2023, bioRxiv.

[2]  A. Elofsson Progress at protein structure prediction, as seen in CASP15. , 2023, Current opinion in structural biology.

[3]  S. Ovchinnikov,et al.  Efficient and scalable de novo protein design using a relaxed sequence space , 2023, bioRxiv.

[4]  B. Wallner,et al.  AFsample: Improving Multimer Prediction with AlphaFold using Aggressive Sampling , 2023, bioRxiv.

[5]  Hamed Khakzad,et al.  De novo protein design by inversion of the AlphaFold structure prediction network , 2022, bioRxiv.

[6]  Lucy J. Colwell,et al.  Prediction of multiple conformational states by combining sequence clustering with AlphaFold2 , 2022, bioRxiv.

[7]  A. Elofsson,et al.  EvoBind: in silico directed evolution of peptide binders with AlphaFold , 2022, bioRxiv.

[8]  S. Ovchinnikov,et al.  State-of-the-art estimation of protein model accuracy using AlphaFold , 2022, bioRxiv.

[9]  B. Wallner,et al.  Improving peptide-protein docking with AlphaFold-Multimer using forced sampling , 2022, bioRxiv.

[10]  J. Meiler,et al.  Sampling alternative conformational states of transporters and receptors with AlphaFold2 , 2022, eLife.

[11]  A. Leitner,et al.  Towards a structurally resolved human protein interaction network , 2021, bioRxiv.

[12]  J. Korbel,et al.  AlphaDesign: A de novo protein design framework based on AlphaFold , 2021, bioRxiv.

[13]  D. Hassabis,et al.  Protein complex prediction with AlphaFold-Multimer , 2021, bioRxiv.

[14]  Douglas E. V. Pires,et al.  A structural biology community assessment of AlphaFold2 applications , 2021, bioRxiv.

[15]  A. Elofsson,et al.  Improved prediction of protein-protein interactions using AlphaFold2 and extended multiple-sequence alignments , 2021, bioRxiv.

[16]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[17]  Peter B. McGarvey,et al.  UniProt: the universal protein knowledgebase in 2021 , 2020, Nucleic Acids Res..

[18]  Robert D. Finn,et al.  MGnify: the microbiome analysis resource in 2020 , 2019, Nucleic Acids Res..

[19]  Milot Mirdita,et al.  HH-suite3 for fast remote homology detection and deep protein annotation , 2019, BMC Bioinformatics.

[20]  J. Söding,et al.  Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold , 2018, bioRxiv.

[21]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[22]  Björn Wallner,et al.  DockQ: A Quality Measure for Protein-Protein Docking Models , 2016, PloS one.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[25]  Yang Zhang,et al.  MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming , 2009, Nucleic acids research.

[26]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[27]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.