Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus

In this paper we describe the creation of a consensus corpus that was obtained through combining three individual annotations of the same clinical corpus in Swedish. We used a few basic rules that were executed automatically to create the consensus. The corpus contains negation words, speculative words, uncertain expressions and certain expressions. We evaluated the consensus using it for negation and speculation cue detection. We used Stanford NER, which is based on the machine learning algorithm Conditional Random Fields for the training and detection. For comparison we also used the clinical part of the BioScope Corpus and trained it with Stanford NER. For our clinical consensus corpus in Swedish we obtained a precision of 87.9 percent and a recall of 91.7 percent for negation cues, and for English with the Bioscope Corpus we obtained a precision of 97.6 percent and a recall of 96.7 percent for negation cues.

[1]  Lior Rokach,et al.  Negation recognition in medical narrative reports , 2008, Information Retrieval.

[2]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[3]  Peter L. Elkin,et al.  A controlled trial of automated classification of negation from clinical notes , 2005, BMC Medical Informatics Decis. Mak..

[4]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[5]  Grace I. Paterson,et al.  Systematized nomenclature of medicine clinical terms (SNOMED CT) to represent computed tomography procedures , 2011, Comput. Methods Programs Biomed..

[6]  Elmar Nöth,et al.  "Of all things the measure is man" automatic classification of emotions and inter-labeler consistency [speech-based emotion recognition] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Elmar Nöth,et al.  AUTOMATIC CLASSIFICATION OF EMOTIONS AND INTER-LABELER CONSISTENCY , .

[8]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[9]  Dan Klein,et al.  Named Entity Recognition with Character-Level Models , 2003, CoNLL.

[10]  Yang Huang,et al.  A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[11]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[12]  György Szarvas,et al.  Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords , 2008, ACL.

[13]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[14]  Sumithra Velupillai,et al.  Towards a better understanding of uncertainties and speculations in Swedish clinical text – Analysis of an initial annotation trial , 2010, NeSp-NLP@ACL.

[15]  Maria Skeppstedt Negation Detection in Swedish Clinical Text , 2010, Louhi@NAACL-HLT.

[16]  Sumithra Velupillai,et al.  How Certain are Clinical Assessments? Annotating Swedish Clinical Text for (Un)certainties, Speculations and Negations , 2010, LREC.

[17]  Roser Morante,et al.  A Metalearning Approach to Processing the Scope of Negation , 2009, CoNLL.