Graphical Models of Residue Coupling in Protein Families

Many statistical measures and algorithmic techniqueshave been proposed for studying residue coupling inprotein families. Generally speaking, two residue positions areconsidered coupled if, in the sequence record, some of theiramino acid type combinations are significantly more commonthan others. While the proposed approaches have proven useful infinding and describing coupling, a significant missing componentis a formal probabilistic model that explicates and compactlyrepresents the coupling, integrates information about sequence,structure, and function, and supports inferential procedures foranalysis, diagnosis, and prediction.We present an approach to learning and using probabilisticgraphical models of residue coupling. These models capturesignificant conservation and coupling constraints observable ina multiply-aligned set of sequences. Our approach can place astructural prior on considered couplings, so that all identifiedrelationships have direct mechanistic explanations. It can alsoincorporate information about functional classes, and therebylearn a differential graphical model that distinguishes constraintscommon to all classes from those unique to individual classes.Such differential models separately account for class-specificconservation and family-wide coupling, two different sourcesof sequence covariation. They are then able to perform interpretablefunctional classification of new sequences, explainingclassification decisions in terms of the underlying conservationand coupling constraints. We apply our approach in studies ofboth G protein-coupled receptors and PDZ domains, identifyingand analyzing family-wide and class-specific constraints, andperforming functional classification. The results demonstrate thatgraphical models of residue coupling provide a powerful toolfor uncovering, representing, and utilizing significant sequencestructure-function relationships in protein families.

[1]  T C Terwilliger,et al.  Engineering multiple properties of a protein by combinatorial mutagenesis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[2]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[3]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[4]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[5]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[6]  K. Karplus REGULARIZERS FOR ESTIMATING DISTRIBUTIONS OF AMINO ACIDS FROM SMALL SAMPLES , 1995 .

[7]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[8]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[9]  A. Horovitz,et al.  Double-mutant cycles: a powerful tool for analyzing protein structure and function. , 1996, Folding & design.

[10]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[11]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[12]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[13]  I. Grigoriev,et al.  Detection of protein fold similarity based on correlation of amino acid properties. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  B. Rost,et al.  Effective use of sequence correlation and conservation in fold recognition. , 1999, Journal of molecular biology.

[15]  William R. Atchley,et al.  Positional Dependence, Cliques, and Predictive Motifs in the bHLH Protein Domain , 1999, Journal of Molecular Evolution.

[16]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[17]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[18]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[19]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[20]  Gert Vriend,et al.  Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems , 2001, Nucleic Acids Res..

[21]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[22]  W. Lim,et al.  Mechanism and role of PDZ domains in signaling complex assembly. , 2001, Journal of cell science.

[23]  M. Sheng,et al.  PDZ Domains: Structural Modules for Protein Complex Assembly* , 2002, The Journal of Biological Chemistry.

[24]  G Vriend,et al.  Correlated Mutation Analyses on Very Large Sequence Families , 2002, Chembiochem : a European journal of chemical biology.

[25]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[26]  Christopher A. Voigt,et al.  Protein building blocks preserved by recombination , 2002, Nature Structural Biology.

[27]  Rama Ranganathan,et al.  Knowledge-based potential functions in protein design. , 2002, Current opinion in structural biology.

[28]  Geoff Hulten,et al.  Mining complex models from arbitrarily large databases in constant time , 2002, KDD.

[29]  A. Horovitz,et al.  Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations , 2002, Proteins.

[30]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[31]  Andrea Califano,et al.  Motif-based construction of a functional map for mammalian olfactory receptors. , 2003, Genomics.

[32]  Andrea Califano,et al.  CASTOR: Clustering Algorithm for Sequence Taxonomical Organization and Relationships , 2003, J. Comput. Biol..

[33]  O. Schueler‐Furman,et al.  Conserved residue clustering and protein structure prediction , 2003, Proteins.

[34]  Gürol M. Süel,et al.  Evolutionarily conserved networks of residues mediate allosteric communication in proteins , 2003, Nature Structural Biology.

[35]  Costas D Maranas,et al.  Using multiple sequence correlation analysis to characterize functionally important protein regions. , 2003, Protein engineering.

[36]  M. Milik,et al.  Common Structural Cliques: a tool for protein structure and function analysis. , 2003, Protein engineering.

[37]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[38]  Richard W. Aldrich,et al.  A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments , 2004, Bioinform..

[39]  Richard W Aldrich,et al.  On Evolutionary Conservation of Thermodynamic Coupling in Proteins* , 2004, Journal of Biological Chemistry.

[40]  M. Drton,et al.  Model selection for Gaussian concentration graphs , 2004 .

[41]  Manfred Burghammer,et al.  Structure of bovine rhodopsin in a trigonal crystal form. , 2003, Journal of molecular biology.

[42]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[43]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[44]  Lucy Skrabanek,et al.  PDZBase: a protein?Cprotein interaction database for PDZ-domains , 2005, Bioinform..

[45]  W. P. Russ,et al.  Natural-like function in artificial WW domains , 2005, Nature.

[46]  Matthew W. Dimmic,et al.  Detecting coevolving amino acid sites using Bayesian mutational mapping , 2005, ISMB.

[47]  A. Horovitz,et al.  Detection and reduction of evolutionary noise in correlated mutation analysis. , 2005, Protein engineering, design & selection : PEDS.

[48]  W. P. Russ,et al.  Evolutionary information for specifying a protein fold , 2005, Nature.

[49]  Chris Bailey-Kellogg,et al.  Site‐directed combinatorial construction of chimaeric genes: General method for optimizing assembly of gene fragments , 2006, Proteins.

[50]  Chris Bailey-Kellogg,et al.  Hypergraph Model of Multi-residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination , 2006, RECOMB.

[51]  Chris Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.