Towards Extracting Medical Family History from Natural Language Interactions: A New Dataset and Baselines

We introduce a new dataset consisting of natural language interactions annotated with medical family histories, obtained during interactions with a genetic counselor and through crowdsourcing, following a questionnaire created by experts in the domain. We describe the data collection process and the annotations performed by medical professionals, including illness and personal attributes (name, age, gender, family relationships) for the patient and their family members. An initial system that performs argument identification and relation extraction shows promising results – average F-score of 0.87 on complex sentences on the targeted relations.

[1]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[2]  D. Wattendorf,et al.  Family history: the three-generation pedigree. , 2005, American family physician.

[3]  Hyeon-Eui Kim,et al.  Identification and Extraction of Family History Information from Clinical Reports , 2008, AMIA.

[4]  R. Sutphen,et al.  Real world experience with cancer genetic counseling via telephone , 2010, Familial Cancer.

[5]  Neal Lewis,et al.  Extracting Family History Diagnosis from Clinical Texts , 2011, BICoB.

[6]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[7]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[8]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Robert Bill,et al.  Automated Extraction of Family History Information from Clinical Notes , 2014, AMIA.

[11]  Heng Ji,et al.  A Dependency-Based Neural Network for Relation Classification , 2015, ACL.

[12]  Bowen Zhou,et al.  Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[13]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[14]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[15]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[16]  J. Wyatt,et al.  Time to rethink the capture and use of family history in primary care. , 2016, The British journal of general practice : the journal of the Royal College of General Practitioners.

[17]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[18]  Peng Zhou,et al.  Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme , 2017, ACL.

[19]  Øystein Nytrø,et al.  Iterative development of family history annotation guidelines using a synthetic corpus of clinical text , 2018, Louhi@EMNLP.