Extracting Clinical Information From Japanese Radiology Reports Using a 2-Stage Deep Learning Approach: Algorithm Development and Validation

Background Radiology reports are usually written in a free-text format, which makes it challenging to reuse the reports. Objective For secondary use, we developed a 2-stage deep learning system for extracting clinical information and converting it into a structured format. Methods Our system mainly consists of 2 deep learning modules: entity extraction and relation extraction. For each module, state-of-the-art deep learning models were applied. We trained and evaluated the models using 1040 in-house Japanese computed tomography (CT) reports annotated by medical experts. We also evaluated the performance of the entire pipeline of our system. In addition, the ratio of annotated entities in the reports was measured to validate the coverage of the clinical information with our information model. Results The microaveraged F1-scores of our best-performing model for entity extraction and relation extraction were 96.1% and 97.4%, respectively. The microaveraged F1-score of the 2-stage system, which is a measure of the performance of the entire pipeline of our system, was 91.9%. Our system showed encouraging results for the conversion of free-text radiology reports into a structured format. The coverage of clinical information in the reports was 96.2% (6595/6853). Conclusions Our 2-stage deep system can extract clinical information from chest and abdomen CT reports accurately and comprehensively.

[1]  Matthew P. Lungren,et al.  RadGraph: Extracting Clinical Entities and Relations from Radiology Reports , 2021, NeurIPS Datasets and Benchmarks.

[2]  Jong-Hoon Oh,et al.  Extracting clinical terms from radiology reports with deep learning , 2021, J. Biomed. Informatics.

[3]  Zhihui Li,et al.  A Survey of Deep Active Learning , 2020, ACM Comput. Surv..

[4]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[5]  Goran Nenadic,et al.  Clinical Text Data in Machine Learning: Systematic Review , 2020, JMIR medical informatics.

[6]  Qin Zhang,et al.  Extracting comprehensive clinical information for breast cancer using deep learning methods , 2019, Int. J. Medical Informatics.

[7]  Paloma Martínez,et al.  A two-stage deep learning approach for extracting entities and relationships from medical texts , 2019, J. Biomed. Informatics.

[8]  Rong Shu,et al.  Introducing Information Extraction to Radiology Information Systems to Improve the Efficiency on Reading Reports , 2019, Methods of Information in Medicine.

[9]  Huda Khayrallah,et al.  Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation , 2019, NAACL.

[10]  Chenliang Li,et al.  A Survey on Deep Learning for Named Entity Recognition , 2018, IEEE Transactions on Knowledge and Data Engineering.

[11]  Xin Zhang,et al.  Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches , 2018, Int. J. Medical Informatics.

[12]  Pierre Zweigenbaum,et al.  Clinical Natural Language Processing in languages other than English: opportunities and challenges , 2018, Journal of Biomedical Semantics.

[13]  European Society of Radiology ESR paper on structured reporting in radiology , 2018, Insights into Imaging.

[14]  Inigo Jauregi Unanue,et al.  Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition , 2017, J. Biomed. Informatics.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Mariana L. Neves,et al.  Neural Domain Adaptation for Biomedical Question Answering , 2017, CoNLL.

[17]  Giovanni Montana,et al.  Modelling Radiological Language with Bidirectional Long Short-Term Memory Networks , 2016, Louhi@EMNLP.

[18]  Zhiyong Lu,et al.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models , 2016, Bioinform..

[19]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[20]  Loes M. M. Braun,et al.  Natural Language Processing in Radiology: A Systematic Review. , 2016, Radiology.

[21]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[22]  Saeed Hassanpour,et al.  Information extraction from multi-institutional radiology reports , 2016, Artif. Intell. Medicine.

[23]  Dong Wang,et al.  Relation Classification via Recurrent Neural Network , 2015, ArXiv.

[24]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25]  Sharon G. Small,et al.  Review of information extraction technologies and applications , 2014, Neural Computing and Applications.

[26]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[27]  J. Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[29]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[30]  James J. Masanz,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[31]  W. Chapman,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[32]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[33]  M. Sordo,et al.  BMC Medical Informatics and Decision Making , 2006 .

[34]  J. Schmidhuber,et al.  2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005 .

[35]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[36]  Erik F. Tjong Kim Sang,et al.  Representing Text Chunks , 1999, EACL.

[37]  Denise R. Aberle,et al.  Extracting information from free text radiology reports , 1997, International Journal on Digital Libraries.

[38]  George Hripcsak,et al.  Natural language processing in an operational clinical information system , 1995, Natural Language Engineering.

[39]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[40]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[41]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[42]  Tatum A. McArthur,et al.  Structured Reporting in Radiology. , 2018, Academic radiology.

[43]  Hongfang Liu,et al.  Clinical information extraction applications: A literature review , 2018, J. Biomed. Informatics.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[46]  Taku Kudo,et al.  MeCab : Yet Another Part-of-Speech and Morphological Analyzer , 2005 .

[47]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[48]  Ricky K. Taira,et al.  A statistical natural language processor for medical reports , 1999, AMIA.

[49]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.