Clinically Significant Information Extraction from Radiology Reports

Radiology reports are one of the most important medical documents that a diagnostician looks into, especially in the emergency context. They provide the emergency physicians with critical information regarding the condition of the patient and help the physicians take immediate action on urgent conditions. However, the reports are in the form of unstructured text, which makes them time consuming for humans to interpret. We have developed a machine learning system to (a) efficiently extract the clinically significant parts and their level of importance in radiology reports, and (b) to classifies the overall report into critical or non-critical categories which help doctors to identify potential high priority reports. As a starting point, the system uses anonymized chest X-RAY reports of adults and provides three levels of importance for medical phrases. We used the Conditional Random Field (CRF) model to identify clinically significant phrases with an average f1-score of 0.75. The proposed system includes a web-based interface which highlights the medical phrases, and their level of importance to the emergency physician. The overall classification of the report is performed using the phrases extracted from the CRF model as features for the classifier. Average accuracy achieved is 85%.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  Curtis P. Langlotz,et al.  Enhancing the expressiveness and usability of structured image reporting systems , 2000, AMIA.

[3]  Shun-ichi Amari,et al.  Backpropagation and stochastic gradient descent method , 1993, Neurocomputing.

[4]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[5]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[6]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[7]  Shuying Shen,et al.  Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents , 2010, J. Am. Medical Informatics Assoc..

[8]  F. Hall,et al.  Language of the radiology report: primer for residents and wayward radiologists. , 2000, AJR. American journal of roentgenology.

[9]  Meliha Yetisgen-Yildiz,et al.  Tumor information extraction in radiology reports for hepatocellular carcinoma patients , 2016, CRI.

[10]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[11]  Alina A. von Davier,et al.  Cross-Validation , 2014 .

[12]  Thomas C. Rindflesch,et al.  MedPost: a part-of-speech tagger for bioMedical text , 2004, Bioinform..

[13]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[14]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[15]  C. Langlotz RadLex: a new method for indexing online educational materials. , 2006, Radiographics : a review publication of the Radiological Society of North America, Inc.

[16]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[17]  Yoshua Bengio,et al.  Architectural Complexity Measures of Recurrent Neural Networks , 2016, NIPS.

[18]  Xavier Carreras,et al.  A Simple Named Entity Extractor using AdaBoost , 2003, CoNLL.

[19]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[20]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[21]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[22]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[23]  Saeed Hassanpour,et al.  Artificial Intelligence in Medicine , 2015 .

[24]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[25]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .

[26]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[27]  Thomas H. Payne,et al.  A text processing pipeline to extract recommendations from radiology reports , 2013, J. Biomed. Informatics.

[28]  F. Rudzicz Human Language Technologies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2010 .

[29]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[30]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[31]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[32]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[33]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[34]  Kalina Bontcheva,et al.  Human Language Technologies , 2009, Semantic Knowledge Management.

[35]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[36]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[37]  Min Li,et al.  A Cascade Approach to Extracting Medication Events , 2009, ALTA.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.