Automatic recognition of abdominal lymph nodes from clinical text

Lymph node status plays a pivotal role in the treatment of cancer. The extraction of lymph nodes from radiology text reports enables large-scale training of lymph node detection on MRI. In this work, we first propose an ontology of 41 types of abdominal lymph nodes with a hierarchical relationship. We then introduce an end-to-end approach based on the combination of rules and transformer-based methods to detect these abdominal lymph node mentions and classify their types from the MRI radiology reports. We demonstrate the superior performance of a model fine-tuned on MRI reports using BlueBERT, called MriBERT. We find that MriBERT outperforms the rule-based labeler (0.957 vs 0.644 in micro weighted F1-score) as well as other BERT-based variations (0.913 - 0.928). We make the code and MriBERT publicly available at https://github.com/ncbi-nlp/bluebert, with the hope that this method can facilitate the development of medical report annotators to produce labels from scratch at scale.

[1]  M. Girolami,et al.  Analysis of free text in electronic health records for identification of cancer patient trajectories , 2017, Scientific Reports.

[2]  M. Harisinghani Atlas of Lymph Node Anatomy , 2013 .

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  C. Langlotz,et al.  Deep Learning to Classify Radiology Free-Text Reports. , 2017, Radiology.

[5]  Ping He,et al.  Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  David A. Wood,et al.  Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM) , 2020, MIDL.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Ramin Khorasani,et al.  Automated Extraction of BI-RADS Final Assessment Categories from Radiology Reports with Natural Language Processing , 2013, Journal of Digital Imaging.

[9]  Zhiyong Lu,et al.  Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.

[10]  Ronald M. Summers,et al.  A self-attention based deep learning method for lesion attribute detection from CT reports , 2019, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[11]  Department of Computer Science,et al.  CheXpert++: Approximating the CheXpert labeler for Speed, Differentiability, and Probabilistic Output , 2020, MLHC.

[12]  Yang Huang,et al.  A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[13]  David Tresner-Kirsch,et al.  MITRE system for clinical assertion status classification , 2011, J. Am. Medical Informatics Assoc..

[14]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[15]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[16]  Franziska Wulf Normal Lymph Node Topography Ct Atlas , 2016 .

[17]  Wendy W. Chapman,et al.  Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm , 2011, J. Biomed. Informatics.

[18]  Chao Han,et al.  Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing , 2019, Chinese medical journal.

[19]  Mike Conway,et al.  Extending the NegEx Lexicon for Multiple Languages , 2013, MedInfo.

[20]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[21]  Benjamin Szubert,et al.  Supervised and unsupervised language modelling in Chest X-Ray radiological reports , 2020, PloS one.

[22]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[23]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[24]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[25]  Ronald M. Summers,et al.  NegBio: a high-performance tool for negation and uncertainty detection in radiology reports , 2017, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[26]  Le Lu,et al.  DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning , 2018, Journal of medical imaging.

[27]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.