Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes.

PURPOSE Cancer research using electronic health records and genomic data sets requires clinical outcomes data, which may be recorded only in unstructured text by treating oncologists. Natural language processing (NLP) could substantially accelerate extraction of this information. METHODS Patients with lung cancer who had tumor sequencing as part of a single-institution precision oncology study from 2013 to 2018 were identified. Medical oncologists' progress notes for these patients were reviewed. For each note, curators recorded whether the assessment/plan indicated any cancer, progression/worsening of disease, and/or response to therapy or improving disease. Next, a recurrent neural network was trained using unlabeled notes to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on labeled assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) among a held-out test set of 10% of patients. Associations between curated response or progression end points and overall survival were measured using Cox models among patients receiving palliative-intent systemic therapy. RESULTS Medical oncologist notes (n = 7,597) were manually curated for 919 patients. In the 10% test set, NLP models replicated human curation with AUROCs of 0.94 for the any-cancer outcome, 0.86 for the progression outcome, and 0.90 for the response outcome. Progression/worsening events identified using NLP models were associated with shortened survival (hazard ratio [HR] for mortality, 2.49; 95% CI, 2.00 to 3.09); response/improvement events were associated with improved survival (HR, 0.45; 95% CI, 0.30 to 0.67). CONCLUSION NLP models based on neural networks can extract meaningful outcomes from oncologist notes at scale. Such models may facilitate identification of clinical and genomic features associated with response to cancer treatment.

[1]  Georgia Tourassi,et al.  Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. , 2019, Cancer research.

[2]  Kenneth L Kehl,et al.  Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports. , 2019, JAMA oncology.

[3]  E. Basch,et al.  The Evolving Uses of "Real-World" Data. , 2019, JAMA.

[4]  D. Schrag,et al.  Race, Poverty, and Initial Implementation of Precision Medicine for Lung Cancer. , 2018, Journal of the National Cancer Institute.

[5]  The Lancet Respiratory Medicine Opening the black box of machine learning. , 2018, The Lancet. Respiratory medicine.

[6]  Christopher Grob,et al.  Real-world application , 2018, Inventory Management in Multi-Echelon Networks.

[7]  Samuel L Volchenboum,et al.  Use of Wearable, Mobile, and Sensor Technology in Cancer Clinical Trials. , 2018, JCO clinical cancer informatics.

[8]  A. Abernethy,et al.  Development and Validation of a High‐Quality Composite Real‐World Mortality Endpoint , 2018, Health services research.

[9]  Po-Hao Chen,et al.  Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports , 2018, Journal of Digital Imaging.

[10]  Hong-Jun Yoon,et al.  Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports , 2018, 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[11]  Shang Gao,et al.  Hierarchical attention networks for information extraction from cancer pathology reports , 2017, J. Am. Medical Informatics Assoc..

[12]  Shyam Visweswaran,et al.  Automated annotation and classification of BI-RADS assessment from radiology reports , 2017, J. Biomed. Informatics.

[13]  Konrad P. Kording,et al.  The need to approximate the use-case in clinical machine learning , 2017, GigaScience.

[14]  R. Califf,et al.  Real-World Evidence - What Is It and What Can It Tell Us? , 2016, The New England journal of medicine.

[15]  Marian Harris,et al.  Institutional implementation of clinical tumor profiling on an unselected cancer population. , 2016, JCI insight.

[16]  Deborah Schrag,et al.  Symptom Monitoring With Patient-Reported Outcomes During Routine Cancer Treatment: A Randomized Controlled Trial. , 2016, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[17]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[18]  Todd R. Johnson,et al.  Factors Affecting Accuracy of Data Abstracted from Medical Records , 2015, PloS one.

[19]  Sandeep Sahu,et al.  OncDRS: An integrative clinical and genomic data platform for enabling translational research and precision medicine , 2015, Applied & translational genomics.

[20]  Charles Elkan,et al.  Optimal Thresholding of Classifiers to Maximize F1 Measure , 2014, ECML/PKDD.

[21]  Scott R. Halgrim,et al.  Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. , 2014, American journal of epidemiology.

[22]  Christopher G Chute,et al.  Discovering peripheral arterial disease cases from radiology notes using natural language processing. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[23]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[24]  P. Harris,et al.  Research electronic data capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support , 2009, J. Biomed. Informatics.

[25]  P. Grambsch,et al.  Modeling Survival Data: Extending the Cox Model , 2000 .

[26]  Hong-Jun Yoon,et al.  Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports , 2018, IEEE Journal of Biomedical and Health Informatics.