Mining relations in the GENIA corpus

Discovering the interactions between genes and proteins is seen as one of the core tasks in molecular biology. The quantity of research results in this area is growing at such a rate that it is very dicult for individual researchers to keep track of them. As such results appear mainly in the form of scientific articles, it is necessary to process them in an ecient manner in order to be able to extract the relevant results. Many databases exist that aim at consolidating the newly gained knowledge in a format that is easily accessible and searchable, however the creators of such databases normally make use of human readers who manually ‘curate’ the relevant papers. This is an expensive and time consuming process, besides, there might be a significant time lag between the publication of a result and its introduction into such databases. In this paper we propose a method for discovery of interactions between genes and proteins from the scientific literature, based on a complete syntactic analysis of the corpus. We report on preliminary results.

[1]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[2]  Graeme Hirst,et al.  Answering Clinical Questions with Role Identification , 2003, BioNLP@ACL.

[3]  D. Sackett Evidence-Based Medicine: How to Practice and Teach EBM , 2018 .

[4]  Park,et al.  Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. , 1998, Genome informatics. Workshop on Genome Informatics.

[5]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[6]  Fabio Rinaldi,et al.  Answering Questions in the Genomics Domain , 2004, ACL 2004.

[7]  Andrei Mikheev,et al.  A Workbench for Finding Structure in Texts , 1997, ANLP.

[8]  Fabio Rinaldi,et al.  Terminology expansion and re-lation identification between genes and path-ways , 2004 .

[9]  Martin Volk Combining Unsupervised and Supervised Methods for PP Attachment Disambiguation , 2002, COLING.

[10]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[11]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[12]  Goran Nenadic,et al.  Using Domain-Specific Verbs for Term Classification , 2003, BioNLP@ACL.

[13]  Fabio Rinaldi,et al.  A robust and hybrid deep-linguistic theory applied to large-scale parsing , 2004, COLING 2004.

[14]  Fabio Rinaldi,et al.  Terminology as knowledge in answer extraction , 2002 .

[15]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[16]  Fabio Rinaldi,et al.  A symbolic approach to automatic multiword term structuring , 2005, Comput. Speech Lang..

[17]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[18]  Fabio Rinaldi,et al.  Fast, deep-linguistic statistical minimalist dependency parsing , 2004, COLING 2004.

[19]  Gerold Schneider,et al.  Extracting and using trace-free functional dependencies from the penn treebank to reduce parsing complexity , 2003 .

[20]  Fabio Rinaldi,et al.  Question Answering in Terminology-Rich Technical Domains , 2004, New Directions in Question Answering.

[21]  Peter Willett,et al.  Protein Structures and Information Extraction from Biological Texts: The PASTA System , 2003, Bioinform..

[22]  Fabio Rinaldi,et al.  Exploiting Paraphrases in a Question Answering System , 2003, IWP@ACL.

[23]  J. Cimino,et al.  Automatic knowledge acquisition from MEDLINE. , 1993, Methods of information in medicine.

[24]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[25]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.