Enhancing Medical Named Entity Recognition with Features Derived from Unsupervised Methods

A study of the usefulness of features extracted from unsupervised methods is pro- posed. The usefulness of these features will be studied on the task of performing named entity recognition within one clinical sub-domain as well as on the task of adapting a named entity recognition model to a new clinical sub-domain. Four named entity types, all very relevant for clinical information extraction, will be studied: Disorder, Finding, Pharmaceutical Drug and Body Structure. The named entity recognition will be performed using conditional random fields. As unsupervised features, a clustering of the semantic representation of words obtained from a ran- dom indexing word space will be used.

[1]  Yefeng Wang,et al.  Annotating and Recognising Named Entities in Clinical Notes , 2009, ACL.

[2]  Son Doan,et al.  Recognition of medication information from discharge summaries using ensembles of classifiers , 2012, BMC Medical Informatics and Decision Making.

[3]  Siddhartha Jonnalagadda,et al.  Enhancing clinical concept extraction with distributional semantics , 2012, J. Biomed. Informatics.

[4]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[5]  Sampo Pyysalo,et al.  Size (and Domain) Matters: Evaluating Semantic Word Space Representations for Biomedical Text , 2012 .

[6]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[7]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[8]  Angus Roberts,et al.  Combining Terminology Resources and Statistical Methods for Entity Recognition: an Evaluation , 2008, LREC.

[9]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[10]  Göran Petersson,et al.  Evaluation and implementation of e-health and health information initiatives: International perspectives , 2010, Health informatics journal.

[11]  Claudio Giuliano,et al.  Unsupervised Part of Speech Tagging Supporting Supervised Methods , 2007 .

[12]  Maria Kvist,et al.  Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study , 2014, J. Biomed. Informatics.

[13]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[14]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[15]  Dayne Freitag,et al.  Trained Named Entity Recognition using Distributional Clusters , 2004, EMNLP.

[16]  Mike Conway,et al.  Identifying Synonymy between SNOMED Clinical Terms of Varying Length Using Distributional Analysis of Electronic Health Records , 2013, AMIA.

[17]  Stephen R. Marsland,et al.  Machine Learning - An Algorithmic Perspective , 2009, Chapman and Hall / CRC machine learning and pattern recognition series.

[18]  David Martinez,et al.  Stability of Text Mining Techniques for Identifying Cancer Staging , 2013 .

[19]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[20]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[21]  Jakob Uszkoreit,et al.  Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.

[22]  Yefeng Wang,et al.  Cascading Classifiers for Named Entity Recognition in Clinical Notes , 2009, BiomedicalIE@RANLP.

[23]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[24]  Sophia Ananiadou,et al.  Fast Full Parsing by Linear-Chain Conditional Random Fields , 2009, EACL.

[25]  Magnus Sahlgren,et al.  Terminology mining in social media , 2009, CIKM.

[26]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[27]  Min Li,et al.  High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge , 2010, J. Am. Medical Informatics Assoc..

[28]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[29]  Martin Gellerstam,et al.  The Bank of Swedish , 2000, LREC.

[30]  P. Kanerva,et al.  Permutations as a means to encode order in word space , 2008 .

[31]  Joel D. Martin,et al.  Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 , 2011, J. Am. Medical Informatics Assoc..

[32]  H. Dalianis,et al.  The Stockholm EPR Corpus – Characteristics and Some Initial Findings , 2009 .