Using kernelized partial canonical correlation analysis to study directly coupled side chains and allostery in small G proteins

Motivation: Inferring structural dependencies among a protein’s side chains helps us understand their coupled motions. It is known that coupled fluctuations can reveal pathways of communication used for information propagation in a molecule. Side-chain conformations are commonly represented by multivariate angular variables, but existing partial correlation methods that can be applied to this inference task are not capable of handling multivariate angular data. We propose a novel method to infer direct couplings from this type of data, and show that this method is useful for identifying functional regions and their interactions in allosteric proteins. Results: We developed a novel extension of canonical correlation analysis (CCA), which we call ‘kernelized partial CCA’ (or simply KPCCA), and used it to infer direct couplings between side chains, while disentangling these couplings from indirect ones. Using the conformational information and fluctuations of the inactive structure alone for allosteric proteins in the Ras and other Ras-like families, our method identified allosterically important residues not only as strongly coupled ones but also in densely connected regions of the interaction graph formed by the inferred couplings. Our results were in good agreement with other empirical findings. By studying distinct members of the Ras, Rho and Rab sub-families, we show further that KPCCA was capable of inferring common allosteric characteristics in the small G protein super-family. Availability and implementation: https://github.com/lsgh/ismb15 Contact: lsoltang@uwaterloo.ca

[1]  M. Stewart,et al.  The structure of the Q69L mutant of GDP-Ran shows a major conformational change in the switch II loop that accounts for its failure to bind nuclear transport factor 2 (NTF2). , 1998, Journal of molecular biology.

[2]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[3]  H. Ng,et al.  Automated electron‐density sampling reveals widespread conformational polymorphism in proteins , 2010, Protein science : a publication of the Protein Society.

[4]  E. Skordalakes,et al.  Disease mutations in Rab7 result in unregulated nucleotide exchange and inappropriate activation , 2009, Human molecular genetics.

[5]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[6]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[7]  Ankur Dhanik,et al.  Modeling discrete heterogeneity in X-ray diffraction data by fitting multi-conformers. , 2009, Acta crystallographica. Section D, Biological crystallography.

[8]  K. Kaibuchi,et al.  Small GTP-binding proteins. , 1992, International review of cytology.

[9]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[10]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[11]  Seren Soner,et al.  Hot Spots in a Network of Functional Sites , 2013, PloS one.

[12]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[13]  Francesca Fanelli,et al.  Nucleotide Binding Switches the Information Flow in Ras GTPases , 2011, PLoS Comput. Biol..

[14]  Jianping Ding,et al.  Structural Basis for the Unique Biological Function of Small GTPase RHEB* , 2005, Journal of Biological Chemistry.

[15]  Kanti V. Mardia,et al.  A multivariate von mises distribution with applications to bioinformatics , 2008 .

[16]  Forbes J. Burkowski Computational and Visualization Techniques for Structural Bioinformatics Using Chimera , 2014 .

[17]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[18]  Samuel L. DeLuca,et al.  Practically Useful: What the Rosetta Protein Modeling Suite Can Do for You , 2010, Biochemistry.

[19]  M. Karplus,et al.  Evaluation of comparative protein modeling by MODELLER , 1995, Proteins.

[20]  W. Kabsch,et al.  Crystal structure of the nuclear Ras-related protein Ran in its GDP-bound form , 1995, Nature.

[21]  H. V. D. Bedem,et al.  Automated identification of functional dynamic contact networks from X-ray crystallography , 2013 .

[22]  R. Wade,et al.  The Interaction Properties of the Human Rab GTPase Family – A Comparative Analysis Reveals Determinants of Molecular Binding Selectivity , 2012, PloS one.

[23]  F. Kozielski,et al.  The structure of human neuronal Rab6B in the active and inactive form. , 2006, Acta crystallographica. Section D, Biological crystallography.

[24]  Kanti V. Mardia,et al.  Mixtures of concentrated multivariate sine distributions with applications to bioinformatics , 2012 .

[25]  Jeffrey J. Gray,et al.  Contact rearrangements form coupled networks from local motions in allosteric proteins , 2008, Proteins.

[26]  R. Nussinov,et al.  Allostery: absence of a change in shape does not imply that allostery is not at play. , 2008, Journal of molecular biology.

[27]  Steven Van Vaerenbergh,et al.  Kernel Methods for Nonlinear Identification, Equalization and Separation of Signals , 2010 .

[28]  David Baker,et al.  Computation of Conformational Coupling in Allosteric Proteins , 2009, PLoS Comput. Biol..

[29]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[30]  S H Kim,et al.  Molecular switch for signal transduction: structural differences between active and inactive forms of protooncogenic ras proteins. , 1992, Science.

[31]  Mu Zhu,et al.  Sparse networks of directly coupled, polymorphic, and functional side chains in allosteric proteins , 2015, Proteins.

[32]  Jinbo Xu,et al.  Rapid Protein Side-Chain Packing via Tree Decomposition , 2005, RECOMB.

[33]  S. Grizot,et al.  Crystal structure of the Rac1-RhoGDI complex involved in nadph oxidase activation. , 2001, Biochemistry.

[34]  Krister Wennerberg,et al.  The Ras superfamily at a glance , 2005, Journal of Cell Science.

[35]  T. Schwartz,et al.  Crystallographic and biochemical analysis of the Ran-binding zinc finger domain. , 2009, Journal of molecular biology.

[36]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[37]  K. Mardia,et al.  Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data , 2007, Biometrics.

[38]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[39]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[40]  Phillip L. Geissler,et al.  Long-Range Intra-Protein Communication Can Be Transmitted by Correlated Side-Chain Fluctuations Alone , 2011, PLoS Comput. Biol..

[41]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .