ProMuteHT: A High Throughput Compute Pipeline for Generating Protein Mutants in silico

Understanding how an amino acid substitution affects a protein's structure is fundamental to advancing drug design and protein docking studies. Mutagenesis experiments on physical proteins provide a precise assessment of the effects of mutations, but they are time and cost prohibitive. Computational approaches for performing in silico amino acid substitutions are available, but they are not suited for generating large numbers of protein variants needed for high-throughput screening studies. We present ProMuteHT, a program for high throughput in silico generating user-specified sets of mutant protein structures with single or multiple amino acid substitutions. We combine our custom mutation algorithm with side chain homology modeling external libraries, and generate energetically feasible mutant structures. Our efficient command-line invocation syntax requires only a few arguments to specify large datasets of mutant structures. We achieve quick run-times due to our hybrid approach in which we limit the use of costly energy calculations when mutating from a large to a small amino acid. We compare our mutant structures with those generated by FoldX, and report faster run-times. We show that the mutants generated by ProMuteHT are of high quality, as determined via all-atom and mutated residue RMSD measurements for existing mutant structures in the PDB.

[1]  B. Rost,et al.  Better prediction of functional effects for sequence variants , 2015, BMC Genomics.

[2]  Jeffrey J. Gray,et al.  Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. , 2003, Journal of molecular biology.

[3]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[4]  Uwe Sauer,et al.  Dissection of helix capping in T4 lysozyme by structural and thermodynamic analysis of six amino acid substitutions at Thr 59 , 1993 .

[5]  Roland L. Dunbrack,et al.  Conformational analysis of the backbone-dependent rotamer preferences of protein sidechains , 1994, Nature Structural Biology.

[6]  Jianwen Fang,et al.  PROTS-RF: A Robust Model for Predicting Mutation-Induced Protein Stability Changes , 2012, PloS one.

[7]  Burkhard Rost,et al.  Comprehensive in silico mutagenesis highlights functionally important residues in proteins , 2008, ECCB.

[8]  U. Sauer,et al.  Dissection of helix capping in T4 lysozyme by structural and thermodynamic analysis of six amino acid substitutions at Thr 59. , 1992, Biochemistry.

[9]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[10]  Nurit Haspel,et al.  An Evolutionary Conservation & Rigidity Analysis Machine Learning Approach for Detecting Critical Protein Residues , 2013, BCB.

[11]  Brian W Matthews,et al.  Contributions of all 20 amino acids at site 96 to the stability and structure of T4 lysozyme , 2009, Protein science : a publication of the Protein Society.

[12]  D Gilis,et al.  Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. , 1997, Journal of molecular biology.

[13]  J. Schellman The thermodynamic stability of proteins. , 1987, Annual review of biophysics and biophysical chemistry.

[14]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[15]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[16]  N. Guex,et al.  SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling , 1997, Electrophoresis.

[17]  Yang Zhang,et al.  Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles , 2015, PLoS Comput. Biol..

[18]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[19]  Sushil Kumar Mishra,et al.  In Silico Mutagenesis and Docking Study of Ralstonia solanacearum RSL Lectin: Performance of Docking Software To Predict Saccharide Binding , 2012, J. Chem. Inf. Model..

[20]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[21]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[22]  Lingchong You,et al.  Dependence of epistasis on environment and mutation severity as revealed by in silico mutagenesis of phage t7. , 2002, Genetics.

[23]  M. Levitt,et al.  Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core , 1991, Nature.

[24]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[25]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[26]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[27]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .

[28]  Lei Jia,et al.  Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools , 2015, PloS one.

[29]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[30]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..