REMed: automatic relation extraction from medical documents

The large amount of unstructured medical documents written in natural language bears a massive quantity of knowledge, whose extraction becomes useful. An automatic relation identification strategy leads to the discovery of relations, (possible unknown) interactions, and associations between medical conditions, investigations and treatments. The current paper introduces a learning based approach for the automatic discovery of relations between medical concepts, entitled REMed. We propose an original list of features, grouped into four categories with the following distribution: lexical - 3, context - 6, grammatical -- 4 and syntactic - 4. We analyzed the influence of each category on the classification performance and determined that the performance of the REMed solution is comparable with similar solutions. We report the overall F-measure as 74.9% that outperforms the best solution reported in the similar systems with 1.2%. This performance was achieved mostly by the features from the lexical and context categories.

[1]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[2]  Aaron Albin,et al.  Enabling Online Studies of Conceptual Relationships Between Medical Terms: Developing an Efficient Web Platform , 2014, JMIR medical informatics.

[3]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[6]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[7]  Dina Demner-Fushman,et al.  NLM’s System Description for the Fourth i2b2/VA Challenge , 2010 .

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Pierre Zweigenbaum,et al.  CARAMBA: Concept, Assertion, and Relation Annotation using Machine-learning Based Approaches , 2010 .

[10]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[11]  Maria Skeppstedt,et al.  Synonym extraction and abbreviation expansion with ensembles of semantic spaces , 2014, Journal of Biomedical Semantics.

[12]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[13]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[14]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[15]  Stéphane M. Meystre,et al.  Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system , 2015, J. Biomed. Semant..

[16]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[17]  Lawrence Hunter,et al.  Extracting Molecular Binding Relationships from Biomedical Text , 2000, ANLP.

[18]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[19]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..