Towards a better understanding of uncertainties and speculations in Swedish clinical text – Analysis of an initial annotation trial

Electronic Health Records (EHRs) contain a large amount of free text documentation which is potentially very useful for Information Retrieval and Text Mining applications. We have, in an initial annotation trial, annotated 6 739 sentences randomly extracted from a corpus of Swedish EHRs for sentence level (un)certainty, and token level speculative keywords and negations. This set is split into different clinical practices and analyzed by means of descriptive statistics and pairwise Inter-Annotator Agreement (IAA) measured by F1-score. We identify geriatrics as a clinical practice with a low average amount of uncertain sentences and a high average IAA, and neurology with a high average amount of uncertain sentences. Speculative words are often n-grams, and uncertain sentences longer than average. The results of this analysis is to be used in the creation of a new annotated corpus where we will refine and further develop the initial annotation guidelines and introduce more levels of dimensionality. Once we have finalized our guidelines and refined the annotations we plan to release the corpus for further research, after ensuring that no identifiable information is included.

[1]  Halil Kilicoglu,et al.  Recognizing speculative language in biomedical research articles: a linguistically motivated perspective , 2008, BMC Bioinformatics.

[2]  Noriko Kando,et al.  Certainty Identification in Texts: Categorization Model and Manual Tagging Results , 2023 .

[3]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[4]  A K Dixon,et al.  Communication of doubt and certainty in radiological reports. , 2000, The British journal of radiology.

[5]  Ramin Khorasani,et al.  Is terminology used effectively to convey diagnostic certainty in radiology reports? , 2003, Academic radiology.

[6]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[7]  Sumithra Velupillai,et al.  How Certain are Clinical Assessments? Annotating Swedish Clinical Text for (Un)certainties, Speculations and Negations , 2010, LREC.

[8]  Roser Morante,et al.  Learning the Scope of Hedge Cues in Biomedical Texts , 2009, BioNLP@HLT-NAACL.

[9]  János Csirik,et al.  The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts , 2008, BioNLP.

[10]  Dennis Reidsma,et al.  Exploiting ‘Subjective’ Annotations , 2008, COLING 2008.

[11]  Hercules Dalianis,et al.  Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus , 2010, NeSp-NLP@ACL.

[12]  Dragomir R. Radev,et al.  Detecting Speculations and their Scopes in Scientific Text , 2009, EMNLP.

[13]  Hagit Shatkay,et al.  New directions in biomedical text annotation: definitions, guidelines and corpus construction , 2006, BMC Bioinformatics.

[14]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.