AIDA: ab initio domain assembly for automated multi-domain protein structure prediction and domain-domain interaction prediction

MOTIVATION Most proteins consist of multiple domains, independent structural and evolutionary units that are often reshuffled in genomic rearrangements to form new protein architectures. Template-based modeling methods can often detect homologous templates for individual domains, but templates that could be used to model the entire query protein are often not available. RESULTS We have developed a fast docking algorithm ab initio domain assembly (AIDA) for assembling multi-domain protein structures, guided by the ab initio folding potential. This approach can be extended to discontinuous domains (i.e. domains with 'inserted' domains). When tested on experimentally solved structures of multi-domain proteins, the relative domain positions were accurately found among top 5000 models in 86% of cases. AIDA server can use domain assignments provided by the user or predict them from the provided sequence. The latter approach is particularly useful for automated protein structure prediction servers. The blind test consisting of 95 CASP10 targets shows that domain boundaries could be successfully determined for 97% of targets. AVAILABILITY AND IMPLEMENTATION The AIDA package as well as the benchmark sets used here are available for download at http://ffas.burnham.org/AIDA/. CONTACT adam@sanfordburnham.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  V. Ingram The evolution of a protein. , 1962, Federation proceedings.

[2]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  Samuel Karlin,et al.  Protein length in eukaryotic and prokaryotic proteomes , 2005, Nucleic acids research.

[5]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[6]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[7]  A. Elofsson,et al.  Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. , 2005, Journal of molecular biology.

[8]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[9]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[10]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[11]  Ruth Nussinov,et al.  Combinatorial docking approach for structure prediction of large proteins and multi-molecular assemblies , 2005, Physical biology.

[12]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[13]  Dong Xu,et al.  FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking , 2014, Bioinform..

[14]  Juan Fernández-Recio,et al.  Structural assembly of two-domain proteins by rigid-body docking , 2008, BMC Bioinformatics.

[15]  M. Sternberg,et al.  Protein structure prediction on the Web: a case study using the Phyre server , 2009, Nature Protocols.

[16]  Leszek Rychlewski,et al.  FFAS03: a server for profile–profile sequence alignments , 2005, Nucleic Acids Res..

[17]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .

[18]  Ying Xu,et al.  Protein domain decomposition using a graph-theoretic approach , 2000, Bioinform..

[19]  Andrew M Wollacott,et al.  Prediction of structures of multidomain proteins from structures of the individual domains , 2006, Protein science : a publication of the Protein Society.

[20]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[21]  Stefan Wuchty,et al.  Inferring protein-protein interactions from multiple protein domain combinations. , 2009, Methods in molecular biology.

[22]  Dong Xu,et al.  AIDA: ab initio domain assembly server , 2014, Nucleic Acids Res..

[23]  S. Teichmann,et al.  The folding and evolution of multidomain proteins , 2007, Nature Reviews Molecular Cell Biology.

[24]  David T. Jones,et al.  Docking protein domains in contact space , 2006, BMC Bioinformatics.

[25]  S. Teichmann,et al.  Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination , 2004, Journal of Structural and Functional Genomics.

[26]  Christian M. Zmasek,et al.  This Déjà Vu Feeling—Analysis of Multidomain Protein Evolution in Eukaryotic Genomes , 2012, PLoS Comput. Biol..

[27]  O Noivirt,et al.  Docking to single‐domain and multiple‐domain proteins: Old and new challenges , 2005, Proteins.

[28]  Arne Elofsson,et al.  Expansion of Protein Domain Repeats , 2006, PLoS Comput. Biol..

[29]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..