Computational characterization of parallel dimeric and trimeric coiled-coils using effective amino acid indices.

The coiled-coil, which consists of two or more α-helices winding around each other, is a ubiquitous and the most frequently observed protein-protein interaction motif in nature. The coiled-coil is known for its straightforward heptad repeat pattern and can be readily recognized based on protein primary sequences, exhibiting a variety of oligomer states and topologies. Due to the stable interaction formed between their α-helices, coiled-coils have been under close scrutiny to design novel protein structures for potential applications in the fields of material science, synthetic biology and medicine. However, their broader application requires an in-depth and systematic analysis of the sequence-to-structure relationship of coiled-coil folding and oligomeric formation. In this article, we propose a new oligomerization state predictor, termed as RFCoil, which exploits the most useful and non-redundant amino acid indices combined with the machine learning algorithm - random forest (RF) - to predict the oligomeric states of coiled-coil regions. Benchmarking experiments show that RFCoil achieves an AUC (area under the ROC curve) of 0.849 on the 10-fold cross-validation test using the training dataset and 0.855 on the independent test using the validation dataset, respectively. Performance comparison results indicate that RFCoil outperforms the four existing predictors LOGICOIL, PrOCoil, SCORER 2.0 and Multicoil2. Furthermore, we extract a number of predominant rules from the trained RF model that underlie the oligomeric formation. We also present two case studies to illustrate the applicability of the extracted rules to the prediction of coiled-coil oligomerization state. The RFCoil web server, source codes and datasets are freely available for academic users at http://protein.cau.edu.cn/RFCoil/.

[1]  A. Keating,et al.  Structural specificity in coiled-coil interactions. , 2008, Current opinion in structural biology.

[2]  James R. Apgar,et al.  Predicting helix orientation for coiled‐coil dimers , 2008, Proteins.

[3]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[4]  Matthias M Dehmer,et al.  Novel topological descriptors for analyzing biological networks , 2010, BMC Structural Biology.

[5]  Thomas L. Vincent,et al.  LOGICOIL - multi-state prediction of coiled-coil oligomeric state , 2013, Bioinform..

[6]  Piero Fariselli,et al.  CCHMM_PROF: a HMM-based coiled-coil predictor with evolutionary information , 2009, Bioinform..

[7]  Jiangning Song,et al.  An Integrative Computational Framework Based on a Two-Step Random Forest Algorithm Improves Prediction of Zinc-Binding Sites in Proteins , 2012, PloS one.

[8]  D. Woolfson,et al.  Predicting oligomerization states of coiled coils , 1995, Protein science : a publication of the Protein Society.

[9]  Thomas L. Vincent,et al.  SCORER 2.0: an algorithm for distinguishing parallel dimeric and trimeric coiled-coil sequences , 2011, Bioinform..

[10]  Julia Martin,et al.  Pattern , 2005, The Fairchild Books Dictionary of Fashion.

[11]  Elizabeth H C Bromley,et al.  Peptide and protein building blocks for synthetic biology: from programming biomolecules to self-organized biomolecular systems. , 2008, ACS chemical biology.

[12]  P. S. Kim,et al.  High-resolution protein design with backbone freedom. , 1998, Science.

[13]  Sergei V Strelkov,et al.  Analysis of alpha-helical coiled coils with the program TWISTER reveals a structural mechanism for stutter compensation. , 2002, Journal of structural biology.

[14]  Oliver D. Testa,et al.  CC+: a relational database of coiled-coil structures , 2008, Nucleic Acids Res..

[15]  R. Hodges,et al.  Insights into the mechanism of heterodimerization from the 1H-NMR solution structure of the c-Myc-Max heterodimeric leucine zipper. , 1998, Journal of molecular biology.

[16]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[17]  J Walshaw,et al.  Socket: a program for identifying and analysing coiled-coil motifs within protein structures. , 2001, Journal of molecular biology.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Amy E. Keating,et al.  Paircoil2: improved prediction of coiled coils from sequence , 2006, Bioinform..

[20]  A. Lupas Coiled coils: new structures and new functions. , 1996, Trends in biochemical sciences.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[23]  G. Fox,et al.  Crystallographic structure of the alpha-helical triple coiled-coil domain of avian reovirus S1133 fibre. , 2009, The Journal of general virology.

[24]  J. Stetefeld,et al.  The use of coiled-coil proteins in drug delivery systems , 2009, European Journal of Pharmacology.

[25]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[26]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  F. Crick,et al.  The packing of α‐helices: simple coiled‐coils , 1953 .

[28]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[29]  M. Delorenzi,et al.  An HMM model for coiled-coil domains and a comparison with PSSM-based predictions , 2002, Bioinform..

[30]  Martin Madera,et al.  The Evolution and Structure Prediction of Coiled Coils across All Genomes , 2022 .

[31]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[32]  Andrei N. Lupas,et al.  The structure of α-helical coiled coils , 2005 .

[33]  S. Ficarro,et al.  Crystal Structure of a Coiled-Coil Domain from Human ROCK I , 2011, PloS one.

[34]  Ericka Stricklin-Parker,et al.  Ann , 2005 .

[35]  Noah Linden,et al.  A de novo peptide hexamer with a mutable channel , 2011, Nature chemical biology.

[36]  T. Jiang,et al.  Coiled-coil networking shapes cell molecular machinery , 2012, Molecular biology of the cell.

[37]  B. Berger,et al.  MultiCoil: A program for predicting two‐and three‐stranded coiled coils , 1997, Protein science : a publication of the Protein Society.

[38]  Hiroshi Wako,et al.  Prediction of protein motions from amino acid sequence and its application to protein-protein interaction , 2010, BMC Structural Biology.

[39]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[40]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[41]  W. Marsden I and J , 2012 .

[42]  B. Berger,et al.  Multicoil2: Predicting Coiled Coils and Their Oligomerization States from Sequence in the Twilight Zone , 2011, PloS one.

[43]  Derek N Woolfson,et al.  Prediction and analysis of higher-order coiled-coils: insights from proteins of the extracellular matrix, tenascins and thrombospondins. , 2013, The international journal of biochemistry & cell biology.

[44]  B. Berger,et al.  Predicting coiled coils by use of pairwise residue correlations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Xing-Ming Zhao,et al.  FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model , 2012, PloS one.

[46]  Ingrid G. Abfalter,et al.  Complex Networks Govern Coiled-Coil Oligomerization – Predicting and Profiling by Means of a Machine Learning Approach , 2011, Molecular & Cellular Proteomics.

[47]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[48]  J. Skehel,et al.  Structure of influenza haemagglutinin at the pH of membrane fusion , 1994, Nature.

[49]  W. DeGrado,et al.  Native-like and structurally characterized designed α-helical bundles , 1995 .

[50]  Lukasz Kurgan,et al.  Computational prediction of secondary and supersecondary structures. , 2013, Methods in molecular biology.

[51]  Ziding Zhang,et al.  Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach , 2011, PloS one.

[52]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.