Learning Patterns for Building Resources about Semantic Relations in the Medical Domain

In this article, we present a method for extracting automatically semantic relations from texts in the medical domain using linguistic patterns. These patterns refer to three levels of information about words: inflected form, lemma and part-of-speech. The method we present consists first in identifying the entities that are part of the relations to extract, that is to say diseases, exams, treatments, drugs or symptoms. Thereafter, sentences that contain couples of entities are extracted and the presence of a semantic relation is validated by applying linguistic patterns. These patterns were previously learnt automatically from a manually annotated corpus by relying onan algorithm based on the edit distance. We first report the results of an evaluation of our medical entity tagger for the five types of entities we have mentioned above and then, more globally, the results of an evaluation of our extraction method for four relations between these entities. Both evaluations were done for French.

[1]  Paul Buitelaar,et al.  Semantic relations in concept-based cross-language medical information retrieval , 2003 .

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Claire Nédellec,et al.  Machine Learning for Information Extraction in Genomics — State of the Art and Perspectives , 2004 .

[4]  Xiaoyan Zhu,et al.  A hybrid method for relation extraction from biomedical literature , 2006, Int. J. Medical Informatics.

[5]  M. Ebell,et al.  Analysis of questions asked by family doctors regarding patient care , 1999, BMJ.

[6]  Sougata Mukherjea,et al.  Discovering Biomedical Relations Utilizing the World-Wide Web , 2006, Pacific Symposium on Biocomputing.

[7]  Mark Craven,et al.  Learning to Extract Relations from MEDLINE , 1999 .

[8]  P. Séguéla,et al.  Extraction de relations sémantiques entre termes et enrichissement de modèles du domaine , 1999 .

[9]  Brigitte Grau,et al.  EQueR: the French Evaluation campaign of Question-Answering Systems , 2006, LREC.

[10]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[11]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[12]  Eduard Hovy,et al.  Towards terascale knowledge acquisition , 2004, COLING 2004.

[13]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[14]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[15]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[16]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[17]  Emmanuel Morin,et al.  Extracting Semantic Relationships between Terms: Supervised vs. Unsupervised Methods , 1999 .