Intelligent Word Embeddings of Free-Text Radiology Reports

Radiology reports are a rich resource for advancing deep learning applications in medicine by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the ambiguity and subtlety of natural language. We propose a hybrid strategy that combines semantic-dictionary mapping and word2vec modeling for creating dense vector embeddings of free-text radiology reports. Our method leverages the benefits of both semantic-dictionary mapping as well as unsupervised learning. Using the vector representation, we automatically classify the radiology reports into three classes denoting confidence in the diagnosis of intracranial hemorrhage by the interpreting radiologist. We performed experiments with varying hyperparameter settings of the word embeddings and a range of different classifiers. Best performance achieved was a weighted precision of 88% and weighted recall of 90%. Our work offers the potential to leverage unstructured electronic health record data by allowing direct analysis of narrative clinical notes.

[1]  Yang Huang,et al.  A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[2]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[3]  Daniel L. Rubin,et al.  FMA-RadLex: An Application Ontology of Radiological Anatomy derived from the Foundational Model of Anatomy Reference Ontology , 2008, AMIA.

[4]  Jon D. Patrick,et al.  Research and applications: Supervised machine learning and active learning in classification of radiology reports , 2014, J. Am. Medical Informatics Assoc..

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  E. Burnside,et al.  Toward best practices in radiology reporting. , 2009, Radiology.

[7]  Lee M. Christensen,et al.  Natural Language Processing to identify pneumonia from radiology reports , 2013, Pharmacoepidemiology and drug safety.

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Kenneth Jung,et al.  Automated Detection of Systematic Off-label Drug Use in Free Text of Electronic Medical Records , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[10]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[11]  S. Trent Rosenbloom,et al.  NLP-based Identification of Pneumonia Cases from Free-Text Radiological Reports , 2008, AMIA.

[12]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Ronald M. Summers,et al.  Unsupervised Category Discovery via Looped Deep Pseudo-Task Optimization Using a Large Scale Radiology Image Database , 2016, ArXiv.