Improved prediction of critical residues for protein function based on network and phylogenetic analyses

BackgroundPhylogenetic approaches are commonly used to predict which amino acid residues are critical to the function of a given protein. However, such approaches display inherent limitations, such as the requirement for identification of multiple homologues of the protein under consideration. Therefore, complementary or alternative approaches for the prediction of critical residues would be desirable. Network analyses have been used in the modelling of many complex biological systems, but only very recently have they been used to predict critical residues from a protein's three-dimensional structure. Here we compare a couple of phylogenetic approaches to several different network-based methods for the prediction of critical residues, and show that a combination of one phylogenetic method and one network-based method is superior to other methods previously employed.ResultsWe associate a network with each member of a set of proteins for which the three-dimensional structure is known and the critical residues have been previously determined experimentally. We show that several network-based centrality measurements (connectivity, 2-connectivity, closeness centrality, betweenness and cluster coefficient) accurately detect residues critical for the protein's function. Phylogenetic approaches render predictions as reliable as the network-based measurements, although, interestingly, the two general approaches tend to predict different sets of critical residues. Hence we propose a hybrid method that is composed of one network-based calculation – the closeness centrality – and one phylogenetic approach – the Conseq server. This hybrid approach predicts critical residues more accurately than the other methods tested here.ConclusionWe show that network analysis can be used to improve the prediction of amino acids critical for protein function, when utilized in combination with phylogenetic approaches. It is proposed that such improvement is due to the complementary nature of these approaches: network-based methods tend to predict as critical those residues that are highly connected and internal (i.e., non-surface), although some surface residues are indeed identified as critical by network analyses; whereas residues chosen by phylogenetic approaches display a lower overall probability of being surface inaccessible.

[1]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[2]  T C Terwilliger,et al.  In vivo characterization of mutants of the bacteriophage f1 gene V protein isolated by saturation mutagenesis. , 1994, Journal of molecular biology.

[3]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[4]  Tal Pupko,et al.  Structural Genomics , 2005 .

[5]  Piero Fariselli,et al.  ConSeq: the identification of functionally and structurally important residues in protein sequences , 2004, Bioinform..

[6]  Edda Klipp,et al.  Systems Biology , 1994 .

[7]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[8]  D. Bredesen,et al.  Mining DNA microarray data using a novel approach based on graph theory , 2001, FEBS letters.

[9]  Marianne Manchester,et al.  Complete mutagenesis of the HIV-1 protease , 1989, Nature.

[10]  Peter S. Shenkin,et al.  Amino Acid Sequence Determinants of β-Lactamase Structure and Activity , 1996 .

[11]  E. Shakhnovich,et al.  Topological determinants of protein folding , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. Elcock Prediction of functionally important residues based solely on the computed energetics of protein structure. , 2001, Journal of molecular biology.

[13]  An-Ping Zeng,et al.  The Connectivity Structure, Giant Strong Component and Centrality of Metabolic Networks , 2003, Bioinform..

[14]  Gil Amitai,et al.  Network analysis of protein structures identifies functional residues. , 2004, Journal of molecular biology.

[15]  Albert-László Barabási,et al.  Systems biology. Life's complexity pyramid. , 2002, Science.

[16]  B. Rost,et al.  Effective use of sequence correlation and conservation in fold recognition. , 1999, Journal of molecular biology.

[17]  Gert Sabidussi,et al.  The centrality index of a graph , 1966 .

[18]  S. Bouvier,et al.  Systematic mutation of bacteriophage T4 lysozyme. , 1991, Journal of molecular biology.

[19]  A. Fersht,et al.  A search for single substitutions that eliminate enzymatic function in a bacterial ribonuclease. , 1998, Biochemistry.

[20]  M Karplus,et al.  Small-world view of the amino acids that play a key role in protein folding. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  F. Harary,et al.  Eccentricity and centrality in networks , 1995 .

[22]  J. Petrosino,et al.  Amino acid sequence determinants of beta-lactamase structure and activity. , 1996, Journal of molecular biology.

[23]  Albert-László Barabási,et al.  Life's Complexity Pyramid , 2002, Science.