LOGICOIL - multi-state prediction of coiled-coil oligomeric state

MOTIVATION The coiled coil is a ubiquitous α-helical protein-structure domain that directs and facilitates protein-protein interactions in a wide variety of biological processes. At the protein-sequence level, the coiled coil is readily recognized via a conspicuous heptad repeat of hydrophobic and polar residues. However, structurally coiled coils are more complicated, existing in a wide range of oligomer states and topologies. As a consequence, predicting these various states from sequence remains an unmet challenge. RESULTS This work introduces LOGICOIL, the first algorithm to address the problem of predicting multiple coiled-coil oligomeric states from protein-sequence information alone. By covering >90% of the known coiled-coil structures, LOGICOIL is a net improvement compared with other existing methods, which achieve a predictive coverage of ∼31% of this population. This leap in predictive power offers better opportunities for genome-scale analysis, and analyses of coiled-coil containing protein assemblies. AVAILABILITY LOGICOIL is available via a web-interface at http://coiledcoils.chm.bris.ac.uk/LOGICOIL. Source code, training sets and supporting information can be downloaded from the same site.

[1]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[2]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[3]  Axel Benner,et al.  penalizedSVM: a R-package for feature selection SVM classification , 2009, Bioinform..

[4]  Marina Vannucci,et al.  Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage , 2004, Biometrics.

[5]  Derek N Woolfson,et al.  Extended knobs-into-holes packing in classical and complex coiled-coil assemblies. , 2003, Journal of structural biology.

[6]  Stephen T. C. Wong,et al.  Cancer classification and prediction using logistic regression with Bayesian gene selection , 2004, J. Biomed. Informatics.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  B. Berger,et al.  MultiCoil: A program for predicting two‐and three‐stranded coiled coils , 1997, Protein science : a publication of the Protein Society.

[9]  Yang Ai-jun,et al.  Bayesian variable selection for disease classification using gene expression data , 2010 .

[10]  P. S. Kim,et al.  A switch between two-, three-, and four-stranded coiled coils in GCN4 leucine zipper mutants. , 1993, Science.

[11]  John Walshaw,et al.  Open‐and‐shut cases in coiled‐coil assembly: α‐sheets and α‐cylinders , 2001 .

[12]  Kosuke Imai,et al.  MNP: R Package for Fitting the Multinomial Probit Model , 2005 .

[13]  Ingrid G. Abfalter,et al.  Complex Networks Govern Coiled-Coil Oligomerization – Predicting and Profiling by Means of a Machine Learning Approach , 2011, Molecular & Cellular Proteomics.

[14]  Min Lu,et al.  Conformational transition between four and five-stranded phenylalanine zippers determined by a local packing interaction. , 2006, Journal of molecular biology.

[15]  D. Woolfson,et al.  A periodic table of coiled-coil protein structures. , 2009, Journal of molecular biology.

[16]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[17]  D. Woolfson,et al.  Predicting oligomerization states of coiled coils , 1995, Protein science : a publication of the Protein Society.

[18]  Jim E. Griffin,et al.  Transdimensional Sampling Algorithms for Bayesian Variable Selection in Classification Problems With Many More Variables Than Observations , 2009 .

[19]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[20]  Derek N Woolfson,et al.  A coiled-coil motif that sequesters ions to the hydrophobic core , 2009, Proceedings of the National Academy of Sciences.

[21]  D. Woolfson,et al.  Buried polar residues and structural specificity in the GCN4 leucine zipper , 1996, Nature Structural Biology.

[22]  Paul Gustafson,et al.  Bayesian multinomial regression with class-specific predictor selection , 2009, 0901.4208.

[23]  Martin Madera,et al.  The Evolution and Structure Prediction of Coiled Coils across All Genomes , 2022 .

[24]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[25]  L. Joseph 4. Bayesian data analysis (2nd edn). Andrew Gelman, John B. Carlin, Hal S. Stern and Donald B. Rubin (eds), Chapman & Hall/CRC, Boca Raton, 2003. No. of pages: xxv + 668. Price: $59.95. ISBN 1‐58488‐388‐X , 2004 .

[26]  Thomas L. Vincent,et al.  SCORER 2.0: an algorithm for distinguishing parallel dimeric and trimeric coiled-coil sequences , 2011, Bioinform..

[27]  D N Woolfson,et al.  Open-and-shut cases in coiled-coil assembly: alpha-sheets and alpha-cylinders. , 2001, Protein science : a publication of the Protein Society.

[28]  R. Tüchler Bayesian Variable Selection for Logistic Models Using Auxiliary Mixture Sampling , 2008 .

[29]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[30]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[31]  Andrei N. Lupas,et al.  The structure of α-helical coiled coils , 2005 .

[32]  Oliver D. Testa,et al.  CC+: a relational database of coiled-coil structures , 2008, Nucleic Acids Res..

[33]  J Walshaw,et al.  Socket: a program for identifying and analysing coiled-coil motifs within protein structures. , 2001, Journal of molecular biology.

[34]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[35]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[36]  Derek N Woolfson,et al.  Preferred side-chain constellations at antiparallel coiled-coil interfaces , 2008, Proceedings of the National Academy of Sciences.

[37]  E R Dougherty,et al.  Multi-class cancer classification using multinomial probit regression with Bayesian gene selection. , 2006, Systems biology.

[38]  P S Kim,et al.  Buried polar residues in coiled-coil interfaces. , 2001, Biochemistry.

[39]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[40]  Jim E. Griffin,et al.  Cross-validation prior choice in Bayesian probit regression with many covariates , 2012, Stat. Comput..

[41]  Arthur B. Light,et al.  CREATINURIA IN ADOLESCENT MALES: II. THE EFFECTS OF THE ORAL ADMINISTRATION OF EPHEDRINE SULPHATE , 1934 .

[42]  Y Bruce Yu,et al.  Coiled-coils: stability, specificity, and drug delivery potential. , 2002, Advanced drug delivery reviews.

[43]  Xin-Yuan Song,et al.  Bayesian variable selection for disease classification using gene expression data , 2010, Bioinform..

[44]  M. Steel,et al.  Benchmark Priors for Bayesian Model Averaging , 2001 .

[45]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[46]  B. Berger,et al.  Multicoil2: Predicting Coiled Coils and Their Oligomerization States from Sequence in the Twilight Zone , 2011, PloS one.

[47]  F. Crick,et al.  The packing of α‐helices: simple coiled‐coils , 1953 .

[48]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[49]  Derek N Woolfson,et al.  Side-chain pairing preferences in the parallel coiled-coil dimer motif: insight on ion pairing between core and flanking sites. , 2010, Journal of the American Chemical Society.

[50]  Johannes Söding,et al.  Comparative analysis of coiled-coil prediction methods. , 2006, Journal of structural biology.

[51]  Klaus Obermayer,et al.  Support Vector Machines for Dyadic Data , 2006, Neural Computation.

[52]  T. Fearn,et al.  Bayes model averaging with selection of regressors , 2002 .

[53]  Marina Vannucci,et al.  Bayesian Models for Variable Selection that Incorporate Biological Information , 2012 .

[54]  David T. Jones,et al.  Getting the most from PSI-BLAST. , 2002, Trends in biochemical sciences.

[55]  D. V. Dyk,et al.  A Bayesian analysis of the multinomial probit model using marginal data augmentation , 2005 .

[56]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[57]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[58]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .