论文信息 - Multilingual Named-Entity Recognition from Parallel Corpora

Multilingual Named-Entity Recognition from Parallel Corpora

We present a named-entity recognition (NER) system for parallel multilingual text. Our system handles three languages (i.e., English, French, and Spanish) and is tailored to the biomedical domain. For each language, we design a supervised knowledge-based CRF model with rich biomedical and general domain information. We use the sentence alignment of the parallel corpora, the word alignment generated by the GIZA++[8] tool, and Wikipedia-based word alignment in order to transfer system predictions made by individual language models to the remaining parallel languages. We re-train each individual language system using the transferred predictions and generate a final enriched NER model for each language. The enriched system performs better than the initial system based on the predictions transferred from the other language systems. Each language model benefits from the external knowledge extracted from biomedical and general domain resources.

[1] Olivier Bodenreider,et al. Aggregating UMLS Semantic Types for Reducing Conceptual Complexity , 2001, MedInfo.

[2] Alan R. Aronson,et al. An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[3] Sanna Salanterä,et al. Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[4] Clement J. McDonald,et al. What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[5] Olivier Bodenreider,et al. The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[6] Christian Nøhr,et al. Comparing Approaches to Measuring the Adoption and Usability of Electronic Health Records: Lessons Learned from Canada, Denmark and Finland , 2013, MedInfo.

[7] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[8] Montserrat Marimon,et al. The Spanish Resource Grammar: Pre-processing Strategy and Lexical Acquisition , 2007, ACL 2007.

[9] Pascal Denis,et al. Statistical French Dependency Parsing: Treebank Conversion and First Results , 2010, LREC.