A natural language processing pipeline for pairing measurements uniquely across free-text CT reports

OBJECTIVE To standardize and objectivize treatment response assessment in oncology, guidelines have been proposed that are driven by radiological measurements, which are typically communicated in free-text reports defying automated processing. We study through inter-annotator agreement and natural language processing (NLP) algorithm development the task of pairing measurements that quantify the same finding across consecutive radiology reports, such that each measurement is paired with at most one other ("partial uniqueness"). METHODS AND MATERIALS Ground truth is created based on 283 abdomen and 311 chest CT reports of 50 patients each. A pre-processing engine segments reports and extracts measurements. Thirteen features are developed based on volumetric similarity between measurements, semantic similarity between their respective narrative contexts and structural properties of their report positions. A Random Forest classifier (RF) integrates all features. A "mutual best match" (MBM) post-processor ensures partial uniqueness. RESULTS In an end-to-end evaluation, RF has precision 0.841, recall 0.807, F-measure 0.824 and AUC 0.971; with MBM, which performs above chance level (P<0.001), it has precision 0.899, recall 0.776, F-measure 0.833 and AUC 0.935. RF (RF+MBM) has error-free performance on 52.7% (57.4%) of report pairs. DISCUSSION Inter-annotator agreement of three domain specialists with the ground truth (κ>0.960) indicates that the task is well defined. Domain properties and inter-section differences are discussed to explain superior performance in abdomen. Enforcing partial uniqueness has mixed but minor effects on performance. CONCLUSION A combined machine learning-filtering approach is proposed for pairing measurements, which can support prospective (supporting treatment response assessment) and retrospective purposes (data mining).

[1]  Daniel L. Rubin,et al.  Tool Support to Enable Evaluation of the Clinical Response to Treatment , 2008, AMIA.

[2]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[3]  M. Christian,et al.  [New guidelines to evaluate the response to treatment in solid tumors]. , 2000, Bulletin du cancer.

[4]  Jeffrey L. Sponsler HPARSER: extracting formal patient data from free text history and physical reports using natural language processing software , 2001, AMIA.

[5]  Hiroyuki Abe,et al.  Cross-Sectional Relatedness Between Sentences in Breast Radiology Reports: Development of an SVM Classifier and Evaluation Against Annotations of Five Breast Radiologists , 2013, Journal of Digital Imaging.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Tracy A Jaffe,et al.  Quantitative imaging in oncology patients: Part 2, oncologists' opinions and expectations at major U.S. cancer centers. , 2010, AJR. American journal of roentgenology.

[8]  Merlijn Sevenster,et al.  Automatically Pairing Measured Findings across Narrative Abdomen CT Reports , 2013, AMIA.

[9]  Dina Demner-Fushman,et al.  Automatic segmentation of clinical texts , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[10]  Joe Kesterson,et al.  Natural language processing for the development of a clinical registry: a validation study in intraductal papillary mucinous neoplasms. , 2010, HPB : the official journal of the International Hepato Pancreato Biliary Association.

[11]  Lawrence M. Fagan,et al.  Medical informatics: computer applications in health care and biomedicine (Health informatics) , 2003 .

[12]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[13]  Daniel L Rubin,et al.  Informatics in radiology: improving clinical work flow through an AIM database: a sample web-based lesion tracking application. , 2012, Radiographics : a review publication of the Radiological Society of North America, Inc.

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[16]  Aaron C. Abajian,et al.  Improving Clinical Work Flow through an AIM Database: A Sample Web-based Lesion Tracking Application1 , 2012 .

[17]  Y Matsumura,et al.  Development of a System that Generates Structured Reports for Chest X-ray Radiography , 2010, Methods of Information in Medicine.

[18]  E. Shortliffe,et al.  Biomedical informatics : computer applications in health care and biomedicine , 2001 .

[19]  Chengyi Zheng,et al.  Automated Identification of Patients With Pulmonary Nodules in an Integrated Health System Using Administrative Health Plan Data, Radiology Reports, and Natural Language Processing , 2012, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[20]  Daniel L Rubin,et al.  A data warehouse for integrating radiologic and pathologic data. , 2008, Journal of the American College of Radiology : JACR.

[21]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[22]  S. Soderland,et al.  Automatic structuring of radiology free-text reports. , 2001, Radiographics : a review publication of the Radiological Society of North America, Inc.

[23]  J. Peters,et al.  Preferences for structured reporting of measurement data: an institutional survey of medical oncologists, oncology registrars, and radiologists. , 2014, Academic radiology.

[24]  Carol Friedman,et al.  Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine , 2013, J. Biomed. Informatics.

[25]  William Hsu,et al.  Tools for improving the characterization and visualization of changes in neuro-oncology patients. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[26]  Stefan L. Zimmerman,et al.  Informatics in radiology: automated structured reporting of imaging findings using the AIM standard and XML. , 2011, Radiographics : a review publication of the Radiological Society of North America, Inc.

[27]  L. Lucey,et al.  Diagnostic radiology reporting and communication: the ACR guideline. , 2005, Journal of the American College of Radiology : JACR.

[28]  Ricky K. Taira,et al.  Context-Based Electronic Health Record: Toward Patient Specific Healthcare , 2012, IEEE Transactions on Information Technology in Biomedicine.

[29]  James H Thrall,et al.  Application of Recently Developed Computer Algorithm for Automatic Classification of Unstructured Radiology Reports: Validation Study 1 , 2004 .

[30]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[31]  Mannudeep K. Kalra,et al.  Use of Radcube for Extraction of Finding Trends in a Large Radiology Practice , 2009, Journal of Digital Imaging.

[32]  Nianwen Xue,et al.  Natural language processing and the oncologic history: is there a match? , 2011, Journal of oncology practice.

[33]  Merlijn Sevenster,et al.  Classifying Measurements in Dictated, Free-Text Radiology Reports , 2013, AIME.

[34]  Nancy Knight,et al.  Radiology reporting, past, present, and future: the radiologist's perspective. , 2007, Journal of the American College of Radiology : JACR.

[35]  Thusitha De Silva Mabotuwana,et al.  Using Image References in Radiology Reports to Support Enhanced Report-to-Image Navigation , 2013, AMIA.

[36]  Tracy A Jaffe,et al.  Quantitative imaging in oncology patients: Part 1, radiology practice patterns at major U.S. cancer centers. , 2010, AJR. American journal of roentgenology.

[37]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[38]  Ankit Garg,et al.  LesionViewer: A Tool for Tracking Cancer Lesions Over Time , 2007, AMIA.

[39]  Daniel C Sullivan,et al.  Imaging as a quantitative science. , 2008, Radiology.

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.