Detection of native and mirror protein structures based on Ramachandran plot analysis by interpretable machine learning models

In this contribution the discrimination between native and mirror models of proteins according to their chirality is tackled based on the structural protein information. This information is contained in the Ramachandran plots of the protein models. We provide an approach to classify those plots by means of an interpretable machine learning classifier - the Generalized Matrix Learning Vector Quantizer. Applying this tool, we are able to distinguish with high accuracy between mirror and native structures just evaluating the Ramachandran plots. The classifier model provides additional information regarding the importance of regions, e.g. α-helices and β-strands, to discriminate the structures precisely. This importance weighting differs for several considered protein classes.

[1]  Wolfgang Wenzel,et al.  Mirror images as naturally competing conformations in protein folding. , 2012, The journal of physical chemistry. B.

[2]  Stanley C. Ahalt,et al.  Competitive learning algorithms for vector quantization , 1990, Neural Networks.

[3]  H. Robbins A Stochastic Approximation Method , 1951 .

[4]  Malgorzata Kotulska,et al.  Automated Procedure for Contact-Map-Based Protein Structure Reconstruction , 2014, The Journal of Membrane Biology.

[5]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[6]  Yang Zhang,et al.  REMO: A new protocol to refine full atomic protein models from C‐alpha traces by optimizing hydrogen‐bonding networks , 2009, Proteins.

[7]  Mark Peplow A Conversation with Ting Zhu , 2018, ACS central science.

[8]  Thomas Villmann,et al.  Can Learning Vector Quantization be an Alternative to SVM and Deep Learning? - Recent Trends and Advanced Variants of Learning Vector Quantization for Classification Learning , 2017, J. Artif. Intell. Soft Comput. Res..

[9]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[10]  Mathias Wilhelm,et al.  A deep proteome and transcriptome abundance atlas of 29 healthy human tissues , 2018, bioRxiv.

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Malgorzata Kotulska,et al.  Applying PyRosetta molecular energies to separate properly oriented protein models from mirror models, obtained from contact maps , 2016, Journal of Molecular Modeling.

[13]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[14]  Lei Liu,et al.  A synthetic molecular system capable of mirror-image genetic replication and transcription. , 2016, Nature chemistry.

[15]  Abigail Sawyer,et al.  Elucidating the structure of membrane proteins , 2019, BioTechniques.

[16]  Todd O Yeates,et al.  Racemic protein crystallography. , 2012, Annual review of biophysics.

[17]  Malgorzata Kotulska,et al.  Automated method to differentiate between native and mirror protein models obtained from contact maps , 2018, PloS one.

[18]  Timothy F. Havel,et al.  A new method for building protein conformations from sequence alignments with homologues of known structure. , 1991, Journal of molecular biology.

[19]  G. P. Moss Basic terminology of stereochemistry (IUPAC Recommendations 1996) , 1996 .

[20]  S Brunak,et al.  Relationship between protein structure and geometrical constraints , 1996, Protein science : a publication of the Protein Society.

[21]  Jingfa Xiao,et al.  Small proteins: untapped area of potential biological importance , 2013, Front. Genet..

[22]  Li Fei Ji,et al.  Substituent effects on the properties of the hemi-bonded complexes (XH2P···NH2Y)+ (X, Y=H, F, Cl, Br, NH2, CH3, OH) , 2015, Journal of Molecular Modeling.

[23]  S. Graf,et al.  Foundations of Quantization for Probability Distributions , 2000 .

[24]  Jean-Christophe Gelly,et al.  Membrane positioning for high- and low-resolution protein structures through a binary classification approach. , 2016, Protein engineering, design & selection : PEDS.

[25]  R Dustin Schaeffer,et al.  Protein folds and protein folding. , 2011, Protein engineering, design & selection : PEDS.

[26]  David L. Smith,et al.  Biased efficacy estimates in phase-III dengue vaccine trials due to heterogeneous exposure and differential detectability of primary infections across trial arms , 2019, PloS one.

[27]  Thomas Villmann,et al.  Regularization in Matrix Relevance Learning , 2010, IEEE Transactions on Neural Networks.

[28]  Michael Biehl,et al.  Distance Learning in Discriminative Vector Quantization , 2009, Neural Computation.

[29]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[30]  Heiko Wersing,et al.  Efficient rejection strategies for prototype-based classification , 2015, Neurocomputing.

[31]  Gwyndaf Evans,et al.  Membrane protein structure determination — The next generation , 2014, Biochimica et biophysica acta.

[32]  Thomas Villmann,et al.  Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines , 2015, Soft Comput..

[33]  Kotagiri Ramamohanarao,et al.  PConPy - a Python module for generating 2D protein maps , 2008, Bioinform..

[34]  Gerard J Kleywegt,et al.  A survey of left-handed helices in protein structures. , 2005, Journal of molecular biology.

[35]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[36]  Wojciech Samek,et al.  Explainable ai – preface , 2019 .

[37]  Michael Lappe,et al.  Optimal contact definition for reconstruction of Contact Maps , 2010, BMC Bioinformatics.

[38]  Matthew T. Weinstock,et al.  Synthesis and folding of a mirror-image enzyme reveals ambidextrous chaperone activity , 2014, Proceedings of the National Academy of Sciences.

[39]  Thomas Villmann,et al.  Investigation of Activation Functions for Generalized Learning Vector Quantization , 2019, WSOM+.

[40]  Roland L. Dunbrack,et al.  The Rosetta all-atom energy function for macromolecular modeling and design , 2017, bioRxiv.

[41]  Thomas Villmann,et al.  Aspects in Classification Learning - Review of Recent Developments in Learning Vector Quantization , 2014 .

[42]  Sergey Lyskov,et al.  PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta , 2010, Bioinform..

[43]  Piero Fariselli,et al.  FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps , 2008, Bioinform..

[44]  Hans Hagen,et al.  Uncertainty-Aware Ramachandran Plots , 2019, 2019 IEEE Pacific Visualization Symposium (PacificVis).

[45]  Thomas Villmann,et al.  Application of an interpretable classification model on Early Folding Residues during protein folding , 2018, bioRxiv.

[46]  Thomas Villmann,et al.  Robustness of Generalized Learning Vector Quantization Models against Adversarial Attacks , 2019, WSOM+.

[47]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[48]  K. Ming Leung,et al.  Learning Vector Quantization , 2017, Encyclopedia of Machine Learning and Data Mining.

[49]  Yan Zhou,et al.  Structure Prediction of Membrane Proteins , 2004, Genomics, proteomics & bioinformatics.

[50]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .

[51]  Min Zhang,et al.  Semaphorin3A induces nerve regeneration in the adult cornea-a switch from its repulsive role in development , 2018, PloS one.

[52]  Hua Xu,et al.  A Dynamic Noise Level Algorithm for Spectral Screening of Peptide MS/MS Spectra , 2010, BMC Bioinformatics.

[53]  Thomas Villmann Learning Vector Quantization Methods for Interpretable Classification Learning and Multilayer Networks , 2018, IJCCI.

[54]  Thomas Villmann,et al.  Application of an interpretable classification model on Early Folding Residues during protein folding , 2018, BioData Mining.

[55]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[56]  Thomas Villmann,et al.  Prototype-based models in machine learning. , 2016, Wiley interdisciplinary reviews. Cognitive science.

[57]  Thomas Villmann,et al.  Limited Rank Matrix Learning, discriminative dimension reduction and visualization , 2012, Neural Networks.

[58]  Thomas Villmann,et al.  Generalized relevance learning vector quantization , 2002, Neural Networks.

[59]  Le Zhao,et al.  Mirror image proteins. , 2014, Current opinion in chemical biology.

[60]  Luis Serrano,et al.  Unraveling the hidden universe of small proteins in bacterial genomes , 2019, Molecular systems biology.

[61]  P. Wittung-Stafshede,et al.  Mirror‐Image 5S Ribonucleoprotein Complexes , 2019, Angewandte Chemie.

[62]  Philip E. Dawson,et al.  Copying Life: Synthesis of an Enzymatically Active Mirror-Image DNA-Ligase Made of D-Amino Acids. , 2019, Cell chemical biology.

[63]  Lorenz M. Mayr,et al.  Identification of d-Peptide Ligands Through Mirror-Image Phage Display , 1996, Science.

[64]  Michael Biehl,et al.  Analysis of Flow Cytometry Data by Matrix Relevance Learning Vector Quantization , 2013, PloS one.

[65]  S. Kent,et al.  Novel protein science enabled by total chemical synthesis , 2019, Protein science : a publication of the Protein Society.

[66]  Yugyung Lee,et al.  RUPEE: A fast and accurate purely geometric protein structure search , 2018, bioRxiv.