Novel Protein-Protein Interactions Inferred from Literature Context

We have developed a method that predicts Protein-Protein Interactions (PPIs) based on the similarity of the context in which proteins appear in literature. This method outperforms previously developed PPI prediction algorithms that rely on the conjunction of two protein names in MEDLINE abstracts. We show significant increases in coverage (76% versus 32%) and sensitivity (66% versus 41% at a specificity of 95%) for the prediction of PPIs currently archived in 6 PPI databases. A retrospective analysis shows that PPIs can efficiently be predicted before they enter PPI databases and before their interaction is explicitly described in the literature. The practical value of the method for discovery of novel PPIs is illustrated by the experimental confirmation of the inferred physical interaction between CAPN3 and PARVB, which was based on frequent co-occurrence of both proteins with concepts like Z-disc, dysferlin, and alpha-actinin. The relationships between proteins predicted by our method are broader than PPIs, and include proteins in the same complex or pathway. Dependent on the type of relationships deemed useful, the precision of our method can be as high as 90%. The full set of predicted interactions is available in a downloadable matrix and through the webtool Nermal, which lists the most likely interaction partners for a given protein. Our framework can be used for prioritizing potential interaction partners, hitherto undiscovered, for follow-up studies and to aid the generation of accurate protein interaction maps.

[1]  Martijn J. Schuemie,et al.  Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification , 2007, J. Biomed. Informatics.

[2]  D. Swanson Medical literature as a potential source of new knowledge. , 1990, Bulletin of the Medical Library Association.

[3]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[4]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[5]  Martijn J. Schuemie,et al.  Combination of Genetic Databases for Improving Identification of Genes and Proteins in Text , 2005 .

[6]  P. Bork,et al.  Predicting biological networks from genomic data , 2008, FEBS letters.

[7]  Barend Mons,et al.  Assignment of protein function and discovery of novel nucleolar proteins based on automatic analysis of MEDLINE , 2007, Proteomics.

[8]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[9]  Hongfang Liu,et al.  Pacific Symposium on Biocomputing 9:238-249(2004) BIOLOGICAL NOMENCLATURES: A SOURCE OF LEXICAL KNOWLEDGE AND AMBIGUITY , 2022 .

[10]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[11]  Martijn J. Schuemie,et al.  Literature-based concept profiles for gene annotation: The issue of weighting , 2008, Int. J. Medical Informatics.

[12]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[13]  K. Kameyama,et al.  Dysferlin Interacts with Affixin (β-Parvin) at the Sarcolemma , 2005 .

[14]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[15]  M. Schuemie,et al.  Anni 2.0: a multipurpose text-mining tool for the life sciences , 2008, Genome Biology.

[16]  J. Sepulveda,et al.  Cellular and Molecular Life Sciences Review The parvins , 2005 .

[17]  H. Sorimachi,et al.  Myogenic Stage, Sarcomere Length, and Protease Activity Modulate Localization of Muscle-specific Calpain* , 2007, Journal of Biological Chemistry.

[18]  Shao Li,et al.  Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach , 2006, Bioinform..

[19]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[20]  J. Beckmann,et al.  Identification of putative in vivo substrates of calpain 3 by comparative proteomics of overexpressing transgenic and nontransgenic mice , 2006, Proteomics.

[21]  Johan T den Dunnen,et al.  Calpain 3 is a modulator of the dysferlin protein complex in skeletal muscle. , 2008, Human molecular genetics.

[22]  J. T. Dunnen,et al.  AHNAK a novel component of the dysferlin protein complex, redistributes to the cytoplasm with dysferlin during skeletal muscle regeneration , 2007, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[23]  N. Bourg,et al.  Calpain 3 Is Activated through Autolysis within the Active Site and Lyses Sarcomeric and Sarcolemmal Components , 2003, Molecular and Cellular Biology.

[24]  K. Kameyama,et al.  Dysferlin interacts with affixin (beta-parvin) at the sarcolemma. , 2005, Journal of neuropathology and experimental neurology.

[25]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[26]  Barend Mons,et al.  Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation , 2007, BMC Bioinformatics.

[27]  Hongfang Liu,et al.  Gene name ambiguity of eukaryotic nomenclatures , 2005, Bioinform..

[28]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from full texts , 2004, Bioinform..

[29]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[30]  Ralf Zimmer,et al.  Expert knowledge without the expert: integrated analysis of gene expression and literature to derive active functional contexts , 2005, ECCB/JBI.

[31]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[33]  A. Suzuki,et al.  Affixin interacts with α-actinin and mediates integrin signaling for reorganization of F-actin induced by initial cell–substrate interaction , 2004, The Journal of cell biology.

[34]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[35]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[36]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[37]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[38]  W. Rottbauer,et al.  Integrin-linked kinase, a novel component of the cardiac mechanical stretch sensor, controls contractility in the zebrafish heart. , 2006, Genes & development.

[39]  Martijn J. Schuemie,et al.  Peregrine: Lightweight gene name normalization by dictionary lookup , 2007 .

[40]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[41]  Jonathan D. Wren,et al.  Knowledge discovery by automated identification and ranking of implicit relationships , 2004, Bioinform..

[42]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[43]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[44]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[45]  H. Sorimachi,et al.  Muscle-specific Calpain, p94, Responsible for Limb Girdle Muscular Dystrophy Type 2A, Associates with Connectin through IS2, a p94-specific Sequence (*) , 1995, The Journal of Biological Chemistry.

[46]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[47]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.