Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

We describe the basic design of a system for automatic detection of protein-protein interactions extracted from scientific abstracts. By restricting the problem domain and imposing a number of strong assumptions which include pre-specified protein names and a limited set of verbs that represent actions, we show that it is possible to perform accurate information extraction. The performance of the system is evaluated with different cases of real-world interaction networks, including the Drosophila cell cycle control. The results obtained computationally are in good agreement with current biological knowledge and demonstrate the feasibility of developing a fully automated system able to describe networks of protein interactions with sufficient accuracy.