Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports: Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases

OBJECTIVE The majority of cancer patients suffer from severe pain at the advanced stage of their illness. In most cases, cancer pain is underestimated by clinical staff and is not properly managed until it reaches a critical stage. Therefore, detecting and addressing cancer pain early can potentially improve the quality of life of cancer patients. The objective of this research project was to develop a generalizable Natural Language Processing (NLP) pipeline to find and classify physician-reported pain in the radiation oncology consultation notes of cancer patients with bone metastases. MATERIALS AND METHODS The texts of 1,249 publicly-available hospital discharge notes in the i2b2 database were used as a training and validation set. The MetaMap and NegEx algorithms were implemented for medical terms extraction. Sets of NLP rules were developed to score pain terms in each note. By averaging pain scores, each note was assigned to one of the three verbally-declared pain (VDP) labels, including no pain, pain, and no mention of pain. Without further training, the generalizability of our pipeline in scoring individual pain terms was tested independently using 30 hospital discharge notes from the MIMIC-III database and 30 consultation notes of cancer patients with bone metastasis from our institution's radiation oncology electronic health record. Finally, 150 notes from our institution were used to assess the pipeline's performance at assigning VDP. RESULTS Our NLP pipeline successfully detected and quantified pain in the i2b2 summary notes with 93% overall precision and 92% overall recall. Testing on the MIMIC-III database achieved precision and recall of 91% and 86% respectively. The pipeline successfully detected pain with 89% precision and 82% recall on our institutional radiation oncology corpus. Finally, our pipeline assigned a VDP to each note in our institutional corpus with 84% and 82% precision and recall, respectively. CONCLUSION Our NLP pipeline enables the detection and classification of physician-reported pain in our radiation oncology corpus.This portable and ready-to-use pipeline can be used to automatically extract and classify physician-reported pain from clinical notes where the pain is not otherwise documented through structured data entry.

[1]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[2]  D. Turk,et al.  American pain society recommendations for improving the quality of acute and cancer pain management: American Pain Society Quality of Care Task Force. , 2005, Archives of internal medicine.

[3]  Serguei V. S. Pakhomov,et al.  Epidemiology of angina pectoris: role of natural language processing of the medical record. , 2007, American heart journal.

[4]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[5]  M. Vidyasagar,et al.  Quality of Life among Cancer Patients , 2017, Indian journal of palliative care.

[6]  Sean D. Rundell,et al.  Comparison of Natural Language Processing Rules-based and Machine-learning Systems to Identify Lumbar Spine Imaging Findings Related to Low Back Pain. , 2018, Academic radiology.

[7]  G. Page,et al.  The immune-suppressive nature of pain. , 1997, Seminars in oncology nursing.

[8]  Dina Demner-Fushman,et al.  MetaMap Lite: an evaluation of a new Java implementation of MetaMap , 2017, J. Am. Medical Informatics Assoc..

[9]  Philip E. Bourne,et al.  Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review , 2019, J. Am. Medical Informatics Assoc..

[10]  Ianita Zlateva,et al.  Using electronic health records data to identify patients with chronic pain in a primary care setting. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[11]  Lucila Ohno-Machado,et al.  Realizing the full potential of electronic health records: the role of natural language processing , 2011, J. Am. Medical Informatics Assoc..

[12]  Sean D. Rundell,et al.  Multisite Pain Is Associated with Long-term Patient-Reported Outcomes in Older Adults with Persistent Back Pain. , 2019, Pain medicine.

[13]  L. Simon RELIEVING PAIN IN AMERICA: A BLUEPRINT FOR TRANSFORMING PREVENTION, CARE, EDUCATION, AND RESEARCH , 2012 .

[14]  M. Clemons,et al.  Incidence and consequences of bone metastases in lung cancer patients , 2013, Journal of bone oncology.

[15]  Xiaoyu Li,et al.  Natural Language Processing for EHR-Based Computational Phenotyping , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Yindalon Aphinyanagphongs,et al.  Utility of general and specific word embeddings for classifying translational stages of research , 2018, AMIA.

[17]  Sylvie Ratté,et al.  Comparison of MetaMap and cTAKES for entity extraction in clinical notes , 2018, BMC Medical Informatics and Decision Making.

[18]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[19]  James J. Masanz,et al.  Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing , 2014, PloS one.

[20]  J. DiSantostefano,et al.  International Classification of Diseases 10th Revision (ICD-10) , 2009 .

[21]  Q. Zeng-Treitler,et al.  Research and applications: Learning regular expressions for clinical text classification , 2014, J. Am. Medical Informatics Assoc..

[22]  Thomas E. Elliott,et al.  Enhancing Risk Assessment in Patients Receiving Chronic Opioid Analgesic Therapy Using Natural Language Processing , 2016, Pain medicine.

[23]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[24]  Özlem Uzuner,et al.  Prescription extraction using CRFs and word embeddings , 2017, J. Biomed. Informatics.

[25]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[26]  Alaa Tharwat,et al.  Classification assessment methods , 2020, Applied Computing and Informatics.

[27]  Timothy A. Miller,et al.  DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. , 2017, Cancer research.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Dezon Finch,et al.  Classifying clinical notes with pain assessment using machine learning , 2017, Medical & Biological Engineering & Computing.

[30]  S. Simmons,et al.  A standardized quality assessment system to evaluate pain detection and management in the nursing home. , 2006, Journal of the American Medical Directors Association.

[31]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[32]  Daniel T. Heinze,et al.  Implementation Brief: Medical i2b2 NLP Smoking Challenge: The A-Life System Architecture and Methodology , 2008, J. Am. Medical Informatics Assoc..

[33]  M. Fukuoka,et al.  Skeletal metastases in non-small cell lung cancer: a retrospective study. , 2007, Lung cancer.

[34]  R. de la Vega,et al.  Assessment of pain intensity in clinical trials: individual ratings vs composite scores. , 2015, Pain medicine.

[35]  Jennifer A. Haythornthwaite,et al.  Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text , 2012, J. Am. Medical Informatics Assoc..

[36]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[37]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[38]  Steven P. Cohen,et al.  An Integrated Quantitative Index for Measuring Chronic Multisite Pain: The Multiple Areas of Pain (MAP) Study , 2018, Pain medicine.

[39]  W. Hsu,et al.  Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore , 2018, JMIR medical informatics.

[40]  Xiaoyan Wang,et al.  Automated Knowledge Acquisition from Clinical Narrative Reports , 2008, AMIA.

[41]  Jingqi Wang,et al.  Enhancing Clinical Concept Extraction with Contextual Embedding , 2019, J. Am. Medical Informatics Assoc..

[42]  Fabian J Theis,et al.  MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data , 2018, bioRxiv.