Iterative development of family history annotation guidelines using a synthetic corpus of clinical text

In this article, we describe the development of annotation guidelines for family history information in Norwegian clinical text. We make use of incrementally developed synthetic clinical text describing patients’ family history relating to cases of cardiac disease and present a general methodology which integrates the synthetically produced clinical statements and guideline development. We analyze inter-annotator agreement based on the developed guidelines and present results from experiments aimed at evaluating the validity and applicability of the annotated corpus using machine learning techniques. The resulting annotated corpus contains 477 sentences and 6030 tokens. Both the annotation guidelines and the annotated corpus are made freely available and as such constitutes the first publicly available resource of Norwegian clinical text.

[1]  Heljä Lundgrén-Laine,et al.  Characteristics and Analysis of Finnish and Swedish Clinical Intensive Care Nursing Narratives , 2010, Louhi@NAACL-HLT.

[2]  M. Hepple,et al.  Semantic Annotation of Clinical Text : The CLEF Corpus , 2008 .

[3]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[4]  Pierre Zweigenbaum,et al.  Automatic extraction of semantic relations between medical entities: a rule based approach , 2011, J. Biomed. Semant..

[5]  Udo Hahn,et al.  Sharing Copies of Synthetic Clinical Corpora without Physical Distribution — A Case Study to Get Around IPRs and Privacy Constraints Featuring the German JSYNCC Corpus , 2018, LREC.

[6]  Robert Bill,et al.  Automated Extraction of Family History Information from Clinical Notes , 2014, AMIA.

[7]  Angus Roberts,et al.  The CLEF Corpus: Semantic Annotation of Clinical Text , 2007, AMIA.

[8]  Hyeon-Eui Kim,et al.  Identification and Extraction of Family History Information from Clinical Reports , 2008, AMIA.

[9]  Clement J. McDonald,et al.  Using A Natural Language Processing System to Extract and Code Family History Data from Admission Reports , 2006, AMIA.

[10]  Anne-Lyse Minard,et al.  Multi-class SVM for Relation Extraction from Clinical Reports , 2011, RANLP.

[11]  Angus Roberts,et al.  Extracting Clinical Relationships from Patient Narratives , 2008, BioNLP.

[12]  Markus Perola,et al.  AUTOGSCAN: powerful tools for automated genome-wide linkage and linkage disequilibrium analysis. , 2005, Twin research and human genetics : the official journal of the International Society for Twin Studies.

[13]  F. Rutten,et al.  2014 ESC Guidelines on diagnosis and management of hypertrophic cardiomyopathy: the Task Force for the Diagnosis and Management of Hypertrophic Cardiomyopathy of the European Society of Cardiology (ESC). , 2014, European heart journal.

[14]  Hercules Dalianis,et al.  Stockholm EPR Corpus : A Clinical Database Used to Improve Health Care , 2012 .

[15]  Lilja Øvrelid,et al.  Universal Dependencies for Norwegian , 2016, LREC.

[16]  Roser Morante,et al.  ConanDoyle-neg: Annotation of negation cues and their scope in Conan Doyle stories , 2012, LREC.

[17]  D. L. Doyle,et al.  Standardized Human Pedigree Nomenclature: Update and Assessment of the Recommendations of the National Society of Genetic Counselors , 2008, Journal of Genetic Counseling.

[18]  Megan Doerr,et al.  Review and Comparison of Electronic Patient-Facing Family Health History Tools , 2018, Journal of Genetic Counseling.

[19]  Gumwon Hong Relation Extraction Using Support Vector Machine , 2005, IJCNLP.

[20]  David K. Vawdrey,et al.  An Assessment of Family History Information Captured in an Electronic Health Record , 2015, AMIA.

[21]  Özlem Uzuner,et al.  Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks , 2015, J. Biomed. Informatics.

[22]  Makoto Miwa,et al.  Modeling Joint Entity and Relation Extraction with Table Representation , 2014, EMNLP.

[23]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[24]  Robert Stevens,et al.  A Family History Knowledge Base in OWL 2 , 2014, ORE.

[25]  Peter Szolovits,et al.  Towards the Creation of a Large Corpus of Synthetically-Identified Clinical Notes , 2018, ArXiv.

[26]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[27]  Øystein Nytrø,et al.  Lessons from Developing an Annotated Corpus of Patient Histories , 2008, J. Comput. Sci. Eng..