PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising. Keywords—Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.

[1]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[2]  John D. Lafferty,et al.  A Robust Parsing Algorithm for Link Grammars , 1995, IWPT.

[3]  Peter Szolovits,et al.  Adding a Medical Lexicon to an English Parser , 2003, AMIA.

[4]  Park,et al.  Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. , 1998, Genome informatics. Workshop on Genome Informatics.

[5]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[6]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[7]  Jian Su,et al.  Protein-Protein Interaction Extraction: A Supervised Learning Approach} , 2005 .

[8]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[9]  Hasan Davulcu,et al.  IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text , 2005, LBLODMBS@IDMB.

[10]  Tapio Salakoski,et al.  Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches , 2006, BMC Bioinformatics.

[11]  Ng,et al.  Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. , 1999, Genome informatics. Workshop on Genome Informatics.

[12]  Tapio Salakoski,et al.  Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions , 2006, Int. J. Medical Informatics.

[13]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[14]  Hsinchun Chen,et al.  A shallow parser based on closed-class words to capture relations in biomedical text , 2003, J. Biomed. Informatics.

[15]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[16]  C. Ouzounis,et al.  Automatic extraction of protein interactions from scientific abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[17]  Xiao Zeng,et al.  A WEB-Based Version of MedLEE: A Medical Language Extraction and Encoding System. , 1996 .

[18]  K. R. Ramakrishnan,et al.  Event information extraction using link grammar , 2003, Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation.

[19]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[20]  Jun Xu,et al.  Extracting biochemical interactions from MEDLINE using a link grammar parser , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[21]  Silke Gerstmann Signaling PAthway Database (SPAD) ??ms an upcomingonline database on signal transduction , 2002 .