Protein asparagine deamidation prediction based on structures with machine learning methods

Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein “hotspots” are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure–function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.

[1]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[2]  P. Reilly,et al.  Substitution of asparagine residues in Aspergillus awamori glucoamylase by site-directed mutagenesis to eliminate N-glycosylation and inactivation by deamidation. , 1994, The Biochemical journal.

[3]  F. Bossa,et al.  Deamidation of asparagine residues in a recombinant serine hydroxymethyltransferase. , 1999, Archives of biochemistry and biophysics.

[4]  R. Pitombo,et al.  Effect of lyophilization on the structure and phase changes of PEGylated-bovine serum albumin. , 2005, International journal of pharmaceutics.

[5]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[6]  E. Topp,et al.  Effect of protein structure on deamidation rate in the Fc fragment of an IgG1 monoclonal antibody , 2009, Protein science : a publication of the Protein Society.

[7]  E. Craig,et al.  Identification and characterization of a new Escherichia coli gene that is a dosage-dependent suppressor of a dnaK deletion mutation , 1990, Journal of bacteriology.

[8]  Ronald T Borchardt,et al.  Aspartate isomerization in the complementarity-determining regions of two closely related monoclonal antibodies. , 2007, Biochemistry.

[9]  K N Houk,et al.  Neighboring side chain effects on asparaginyl and aspartyl degradation: an ab initio study of the relationship between peptide conformation and backbone NH acidity. , 2001, Journal of the American Chemical Society.

[10]  I. Bahar,et al.  Modeling the deamidation of asparagine residues via succinimide intermediates , 2001 .

[11]  P. Reilly,et al.  Increased thermostability of Asn182 → Ala mutant Aspergillus awamori glucoamylase , 1994, Biotechnology and bioengineering.

[12]  S. Noguchi Conformational variation revealed by the crystal structure of RNase U2A complexed with Ca ion and 2'-adenylic acid at 1.03 Å resolution. , 2010, Protein and peptide letters.

[13]  O. Carugo,et al.  Correlation between occupancy and B factor of water molecules in protein crystal structures. , 1999, Protein engineering.

[14]  Lawrence W. Dick,et al.  Isomerization in the CDR2 of a monoclonal antibody: Binding analysis and factors that influence the isomerization rate , 2010, Biotechnology and bioengineering.

[15]  R. Bischoff,et al.  Deamidation of asparagine and glutamine residues in proteins and peptides: structural determinants and analytical methodology. , 1994, Journal of chromatography. B, Biomedical applications.

[16]  D. J. Strydom,et al.  The amino acid sequence of human ribonuclease 4, a highly conserved ribonuclease that cleaves specifically on the 3' side of uridine. , 1993, European journal of biochemistry.

[17]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[18]  Apollon Papadimitriou,et al.  Structure-Based Prediction of Asparagine and Aspartate Degradation Sites in Antibody Variable Regions , 2014, PloS one.

[19]  N. Robinson,et al.  Prediction of primary structure deamidation rates of asparaginyl and glutaminyl peptides through steric and catalytic effects. , 2004, The journal of peptide research : official journal of the American Peptide Society.

[20]  Naomi S. Altman,et al.  Points of Significance: Classification evaluation , 2016, Nature Methods.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  J. Udgaonkar,et al.  Characterization of deamidation of barstar using electrospray ionization quadrupole time‐of‐flight mass spectrometry, which stabilizes an equilibrium unfolding intermediate , 2012, Protein science : a publication of the Protein Society.

[24]  Alain Beck,et al.  Identification and characterization of asparagine deamidation in the light chain CDR1 of a humanized IgG1 antibody. , 2009, Analytical biochemistry.

[25]  A. Tholey,et al.  Influence of myristoylation, phosphorylation, and deamidation on the structural behavior of the N-terminus of the catalytic subunit of cAMP-dependent protein kinase. , 2001, Biochemistry.

[26]  S. Capasso,et al.  Kinetic and thermodynamic control of the relative yield of the deamidation of asparagine and isomerization of aspartic acid residues. , 2000, The journal of peptide research : official journal of the American Peptide Society.

[27]  N. Kunishima,et al.  Crystal Structure of Rat Bcl-xL , 1997, The Journal of Biological Chemistry.

[28]  N. Robinson,et al.  Molecular clocks. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  N. Robinson,et al.  Prediction of protein deamidation rates from primary and three-dimensional structure , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Hendrik Zipse,et al.  ACCELERATED RACEMIZATION OF ASPARTIC ACID AND ASPARAGINE RESIDUES VIA SUCCINIMIDE INTERMEDIATES : AN AB INITIO THEORETICAL EXPLORATION OF MECHANISM , 1996 .

[31]  I. E. Sánchez,et al.  Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder , 2015, PloS one.

[32]  Ken A. Dill,et al.  In silico selection of therapeutic antibodies for development: Viscosity, clearance, and chemical stability , 2014, Proceedings of the National Academy of Sciences.

[33]  Ronald T Borchardt,et al.  Formulation considerations for proteins susceptible to asparagine deamidation and aspartate isomerization. , 2006, Journal of pharmaceutical sciences.

[34]  Douglas S Rehder,et al.  Identification and characterization of deamidation sites in the conserved regions of human immunoglobulin gamma antibodies. , 2005, Analytical chemistry.

[35]  T. Hallahan,et al.  Importance of asparagine-61 and asparagine-109 to the angiogenic activity of human angiogenin. , 1992, Biochemistry.

[36]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[37]  J. M. Chaves,et al.  Structural and functional roles of deamidation and/or truncation of N- or C-termini in human alpha A-crystallin. , 2008, Biochemistry.

[38]  A. B. Robinson,et al.  Rates of nonenzymatic deamidation of glutaminyl and asparaginyl residues in pentapeptides. , 1973, Journal of the American Chemical Society.

[39]  T. Takao,et al.  Quantitative analysis of deamidation and isomerization in β2-microglobulin by 18O labeling. , 2012, Analytical chemistry.

[40]  N. Robinson,et al.  Protein deamidation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[41]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[42]  S. Noguchi Structural changes induced by the deamidation and isomerization of asparagine revealed by the crystal structure of Ustilago sphaerogena ribonuclease U2B. , 2010, Biopolymers.