Enhanced PIELG: A Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts

Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper investigates the effect of adding a new module - Complex Sentence Processor (CSP) - to the PIELG system. PIELG is a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are enhanced to 49.33 % and 65.16 % respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that enhanced PIELG system achieves better performance. The result shows that the performance is remarkably promising.

[1]  K. R. Ramakrishnan,et al.  Event information extraction using link grammar , 2003, Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation.

[2]  Xiao Zeng,et al.  A WEB-Based Version of MedLEE: A Medical Language Extraction and Encoding System. , 1996 .

[3]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[4]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[5]  Jian Su,et al.  Protein-Protein Interaction Extraction: A Supervised Learning Approach} , 2005 .

[6]  Ng,et al.  Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. , 1999, Genome informatics. Workshop on Genome Informatics.

[7]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[8]  Jun Xu,et al.  Extracting biochemical interactions from MEDLINE using a link grammar parser , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[9]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[10]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[11]  John D. Lafferty,et al.  A Robust Parsing Algorithm for Link Grammars , 1995, IWPT.

[12]  Hsinchun Chen,et al.  A shallow parser based on closed-class words to capture relations in biomedical text , 2003, J. Biomed. Informatics.

[13]  Tapio Salakoski,et al.  Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches , 2006, BMC Bioinformatics.

[14]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[15]  C. Ouzounis,et al.  Automatic extraction of protein interactions from scientific abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  Rania A. Abul Seoud,et al.  PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts , 2007 .

[17]  John Lafferty,et al.  Extraction of Protein Interaction Information from Unstructured Text Using a Link Grammar Parser , 2007 .

[18]  Peter Szolovits,et al.  Adding a Medical Lexicon to an English Parser , 2003, AMIA.

[19]  Tapio Salakoski,et al.  Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions , 2006, Int. J. Medical Informatics.

[20]  Hasan Davulcu,et al.  IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text , 2005, LBLODMBS@IDMB.

[21]  Limsoon Wong,et al.  Accomplishments and challenges in literature data mining for biology , 2002, Bioinform..

[22]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[23]  Park,et al.  Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. , 1998, Genome informatics. Workshop on Genome Informatics.