Inferring molecular function: contributions from functional linkages.

In the current era of high-throughput sequencing and structure determination, functional annotation has become a bottleneck in biomedical science. Here, we show that automated inference of molecular function using functional linkages among genes increases the accuracy of functional assignments by > or =8% and enriches functional descriptions in > or =34% of top assignments. Furthermore, biochemical literature supports >80% of automated inferences for previously unannotated proteins. These results emphasize the benefit of incorporating functional linkages in protein annotation.

[1]  S. Chapman,et al.  Crystallographic study of the recombinant flavin-binding domain of Baker's yeast flavocytochrome b(2): comparison with the intact wild-type enzyme. , 2002, Biochemistry.

[2]  B. L. Sibanda,et al.  Crystal structure of an Xrcc4–DNA ligase IV complex , 2001, Nature Structural Biology.

[3]  J. Clardy,et al.  Structural basis for the guanine nucleotide-binding activity of tissue transglutaminase and its regulation of transamidation activity , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  N. Strynadka,et al.  Structure of a Sialic Acid-activating Synthetase, CMP-acylneuraminate Synthetase in the Presence and Absence of CDP* , 2001, The Journal of Biological Chemistry.

[5]  A. Kosugi,et al.  Yutaka Cellulosomes from Mesophilic Bacteria , 2003 .

[6]  Zhaohui Xu,et al.  The crystal structure of ribosomal chaperone trigger factor from Vibrio cholerae. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  A. Matte,et al.  The crystal structure of Escherichia coli MoeA, a protein from the molybdopterin synthesis pathway. , 2001, Journal of molecular biology.

[8]  Haruki Nakamura,et al.  Atomic structure of the RuvC resolvase: A holliday junction-specific endonuclease from E. coli , 1994, Cell.

[9]  R. Huber,et al.  A functional Ni-Ni-[4Fe-4S] cluster in the monomeric acetyl-CoA synthase from Carboxydothermus hydrogenoformans. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  L. Fetler,et al.  The allosteric activator Mg-ATP modifies the quaternary structure of the R-state of Escherichia coli aspartate transcarbamylase without altering the T<-->R equilibrium. , 2001, Journal of molecular biology.

[11]  Marco Punta,et al.  Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. , 2005, Drug discovery today.

[12]  S. Barry,et al.  Acidic phospholipids inhibit the intramolecular association between the N- and C-terminal regions of vinculin, exposing actin-binding and protein kinase C phosphorylation sites. , 1996, The Biochemical journal.

[13]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[14]  S. West,et al.  Resolution of holliday junctions by RuvC resolvase: Cleavage specificity and DNA distortion , 1993, Cell.

[15]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[16]  M van Heel,et al.  The 3D arrangement of the 23 S and 5 S rRNA in the Escherichia coli 50 S ribosomal subunit based on a cryo-electron microscopic reconstruction at 7.5 A resolution. , 2000, Journal of molecular biology.

[17]  K. Hopfner,et al.  Structural framework for the mechanism of archaeal exosomes in RNA processing. , 2005, Molecular cell.

[18]  A. Liljas,et al.  Archaeal ribosomal protein L1: the structure provides new insights into RNA binding of the L1 protein family. , 2000, Structure.

[19]  Harald Schwalbe,et al.  Evidence for transmembrane proton transfer in a dihaem‐containing membrane protein complex , 2006, The EMBO journal.

[20]  J H Prestegard,et al.  Rapid determination of protein folds using residual dipolar couplings. , 2000, Journal of molecular biology.

[21]  S. Phillips,et al.  Structure of HrcQB-C, a conserved component of the bacterial type III secretion systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: 2002 update , 2002, Nucleic Acids Res..

[23]  I. Tanaka,et al.  Crystal structure of human p120 homologue protein PH1374 from Pyrococcus horikoshii , 2004, Proteins.

[24]  Pär Nordlund,et al.  Crystal structure of CaiB, a type-III CoA transferase in carnitine metabolism. , 2004, Biochemistry.

[25]  M. Wahl,et al.  Crystal structure of ribosomal protein L4 shows RNA‐binding sites for ribosome incorporation and feedback control of the S10 operon , 2000, The EMBO journal.

[26]  S. Golden,et al.  Structure of the N-terminal domain of the circadian clock-associated histidine kinase SasA. , 2004, Journal of molecular biology.

[27]  J. L. Crawford,et al.  Crystal and molecular structures of native and CTP-liganded aspartate carbamoyltransferase from Escherichia coli. , 1982, Journal of molecular biology.

[28]  C. Shearman,et al.  The Rhizobium leguminosarum nodulation gene nodF encodes a polypeptide similar to acyl‐carrier protein and is regulated by nodD plus a factor in pea root exudate , 1986, The EMBO journal.

[29]  R. Stevens,et al.  Structural consequences of effector binding to the T state of aspartate carbamoyltransferase: crystal structures of the unligated and ATP- and CTP-complexed enzymes at 2.6-A resolution. , 1990, Biochemistry.

[30]  David Eisenberg,et al.  The directional atomic solvation energy: An atom-based potential for the assignment of protein sequences to known folds , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  D. Kilburn,et al.  C1-Cx revisited: intramolecular synergism in a cellulase. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[32]  T. Steitz,et al.  Crystal structure of a bacterial family‐III cellulose‐binding domain: a general mechanism for attachment to cellulose. , 1996, The EMBO journal.

[33]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[34]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[35]  K. Fiebig,et al.  NMR solution structure and dynamics of the peptidyl-prolyl cis-trans isomerase domain of the trigger factor from Mycoplasma genitalium compared to FK506-binding protein. , 2002, Journal of molecular biology.

[36]  C. Steegborn,et al.  Determinants of Enzymatic Specificity in the Cys-Met-Metabolism PLP-Dependent Enzyme Family: Crystal Structure of Cystathionine γ-Lyase from Yeast and Intrafamiliar Structure Comparison , 2003, Biological chemistry.

[37]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[38]  P. Güntert,et al.  NMR structure of the heme chaperone CcmE reveals a novel functional motif. , 2002, Structure.

[39]  T. Kawabata,et al.  Solution Structure of the Fibronectin Type III Domain fromBacillus circulans WL-12 Chitinase A1* , 2002, The Journal of Biological Chemistry.

[40]  T. Steitz,et al.  The roles of ribosomal proteins in the structure assembly, and evolution of the large ribosomal subunit. , 2004, Journal of molecular biology.

[41]  J. Holton,et al.  Structures of the Bacterial Ribosome at 3.5 Å Resolution , 2005, Science.

[42]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[43]  Bruce R Donald,et al.  Phylogenetic Classification of Protozoa Based on the Structure of the Linker Domain in the Bifunctional Enzyme, Dihydrofolate Reductase-Thymidylate Synthase* , 2003, Journal of Biological Chemistry.

[44]  Zanmin Hu,et al.  Genes differentially expressed under photoinhibition stress in flag leaves of super-hybrid rice Liangyoupeijiu (Oryza sativa) and their genetic origins , 2005, Photosynthetica.

[45]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[46]  James C Sacchettini,et al.  Biochemical and Structural Studies of Malate Synthase fromMycobacterium tuberculosis * , 2002, The Journal of Biological Chemistry.

[47]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[48]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[49]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[50]  Harry J. Gilbert,et al.  Cellulosome assembly revealed by the crystal structure of the cohesin–dockerin complex , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[51]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[52]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[53]  Jean-Michel Claverie,et al.  Phydbac "Gene Function Predictor" : a gene annotation tool based on genomic context analysis , 2005, BMC Bioinformatics.

[54]  B. Golden,et al.  Ribosomal protein L6: structural evidence of gene duplication from a primitive RNA binding protein. , 1993, The EMBO journal.

[55]  Jan Löwe,et al.  Crystal structure of the cell division protein FtsA from Thermotoga maritima , 2000, The EMBO journal.

[56]  A. Fink Chaperone-mediated protein folding. , 1999, Physiological reviews.

[57]  C. Delteil,et al.  DNA-dependent Protein Kinase and XRCC4-DNA Ligase IV Mobilization in the Cell in Response to DNA Double Strand Breaks* , 2005, Journal of Biological Chemistry.

[58]  R. E. Foster,et al.  Monoubiquitination of the nonhomologous end joining protein XRCC4. , 2006, Biochemical and biophysical research communications.

[59]  S. Korsmeyer,et al.  Solution Structure of the Proapoptotic Molecule BID A Structural Basis for Apoptotic Agonists and Antagonists , 1999, Cell.

[60]  A Yonath,et al.  Crystal structures of complexes of the small ribosomal subunit with tetracycline, edeine and IF3 , 2001, The EMBO journal.

[61]  R. Huber,et al.  Crystal structure of Escherichia coli cystathionine gamma-synthase at 1.5 A resolution. , 1998, The EMBO journal.

[62]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[63]  Ekaterina Morgunova,et al.  Structural insight into the complex formation of latent matrix metalloproteinase 2 with tissue inhibitor of metalloproteinase 2 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[64]  J. Frère,et al.  Streptomyces Albus G D‐Ala‐D‐Ala Carboxypeptidase , 2006 .

[65]  W. Somers,et al.  Insights into the Molecular Basis of Leukocyte Tethering and Rolling Revealed by Structures of P- and E-Selectin Bound to SLeX and PSGL-1 , 2000, Cell.

[66]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[67]  R. Huber,et al.  Active site geometry and substrate recognition of the molybdenum hydroxylase quinoline 2-oxidoreductase. , 2004, Structure.

[68]  A. Liljas,et al.  The crystal structure of ribosomal protein L22 from Thermus thermophilus: insights into the mechanism of erythromycin resistance. , 1998, Structure.

[69]  W. Köster ABC transporter-mediated uptake of iron, siderophores, heme and vitamin B12. , 2001, Research in microbiology.

[70]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[71]  E. Bouveret,et al.  Colicin Import into Escherichia coli Cells , 1998, Journal of bacteriology.

[72]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[73]  Douglas C. Rees,et al.  The E. coli BtuCD Structure: A Framework for ABC Transporter Architecture and Mechanism , 2002, Science.

[74]  D. Richardson,et al.  Structure and spectroscopy of the periplasmic cytochrome c nitrite reductase from Escherichia coli. , 2002, Biochemistry.

[75]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[76]  B Henrissat,et al.  Solution structure of the module X2 1 of unknown function of the cellulosomal scaffolding protein CipC of Clostridium cellulolyticum. , 2000, Journal of molecular biology.

[77]  S. Jackson,et al.  Mammalian DNA double-strand break repair protein XRCC4 interacts with DNA ligase IV , 1997, Current Biology.