Clinical Text Data in Machine Learning: Systematic Review

Background Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. Results The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. Conclusions We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.

[1]  L. Ungar,et al.  Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay* , 2018, Critical care medicine.

[2]  Stéphane M. Meystre,et al.  Classification of Contextual Use of Left Ventricular Ejection Fraction Assessments , 2015, MedInfo.

[3]  Raj M. Ratwani,et al.  Integrating natural language processing expertise with patient safety event review committees to improve the analysis of medication events , 2017, Int. J. Medical Informatics.

[4]  Aron Henriksson,et al.  Automated Diagnosis Coding with Combined Text Representations. , 2017, Studies in health technology and informatics.

[5]  Marcus A. Badgeley,et al.  Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports. , 2018, Radiology.

[6]  Chen Lin,et al.  Multilayered temporal modeling for the clinical domain , 2016, J. Am. Medical Informatics Assoc..

[7]  Hua Xu,et al.  Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features , 2013, BMC Medical Informatics and Decision Making.

[8]  R. Barzilay,et al.  High-Risk Breast Lesions: A Machine Learning Model to Predict Pathologic Upgrade and Reduce Unnecessary Surgical Excision. , 2017, Radiology.

[9]  Dezon Finch,et al.  Classifying clinical notes with pain assessment using machine learning , 2017, Medical & Biological Engineering & Computing.

[10]  H. Chansky,et al.  Preoperative Opioid Use Is Associated with Early Revision After Total Knee Arthroplasty: A Study of Male Patients Treated in the Veterans Affairs System , 2017, The Journal of bone and joint surgery. American volume.

[11]  Sanda M. Harabagiu,et al.  Deep Learning Meets Biomedical Ontologies: Knowledge Embeddings for Epilepsy , 2017, AMIA.

[12]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[13]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[14]  Yijun Shao,et al.  Identifying Axial Spondyloarthritis in Electronic Medical Records of US Veterans , 2016, Arthritis care & research.

[15]  Ying Wang,et al.  Using multiclass classification to automate the identification of patient safety incident reports by type and severity , 2017, BMC Medical Informatics and Decision Making.

[16]  Michael Krauthammer,et al.  Controlling testing volume for respiratory viruses using machine learning and text mining , 2016, AMIA.

[17]  Xin Liu,et al.  An automatic system to identify heart disease risk factors in clinical texts over time , 2015, J. Biomed. Informatics.

[18]  Murthy V. Devarakonda,et al.  Automated problem list generation and physicians perspective from a pilot study , 2017, Int. J. Medical Informatics.

[19]  Wei Chen,et al.  The utility of including pathology reports in improving the computational identification of patients , 2016, Journal of pathology informatics.

[20]  Judith W. Dexheimer,et al.  Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department , 2014, J. Am. Medical Informatics Assoc..

[21]  Hong Yu,et al.  Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning , 2018, JMIR public health and surveillance.

[22]  Xiaolong Wang,et al.  Recognizing Disjoint Clinical Concepts in Clinical Text Using Machine Learning-based Methods , 2015, AMIA.

[23]  Jennifer G. Robinson,et al.  Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[24]  Özlem Uzuner,et al.  Symptom severity prediction from neuropsychiatric clinical records: Overview of 2016 CEGS N-GRID shared tasks Track 2. , 2017, Journal of biomedical informatics.

[25]  Adam Wright,et al.  Predicting Health Care Utilization After Behavioral Health Referral Using Natural Language Processing and Machine Learning , 2015, AMIA.

[26]  Munindar P. Singh,et al.  Triaging Patient Complaints: Monte Carlo Cross-Validation of Six Machine Learning Classifiers , 2017, JMIR medical informatics.

[27]  Özlem Uzuner,et al.  Automatic prediction of coronary artery disease from clinical narratives , 2017, J. Biomed. Informatics.

[28]  Shuying Shen,et al.  Evaluating the state of the art in coreference resolution for electronic medical records , 2012, J. Am. Medical Informatics Assoc..

[29]  Bruce E. Bray,et al.  Congestive heart failure information extraction framework for automated treatment performance measures assessment , 2017, J. Am. Medical Informatics Assoc..

[30]  Daniel Fabbri,et al.  Natural Language Processing for Cohort Discovery in a Discharge Prediction Model for the Neonatal ICU , 2016, Applied Clinical Informatics.

[31]  Jun Xu,et al.  Clinical Named Entity Recognition Using Deep Learning Models , 2017, AMIA.

[32]  Hua Xu,et al.  A study of active learning methods for named entity recognition in clinical text , 2015, J. Biomed. Informatics.

[33]  Edmond Zhang,et al.  Improving Clinical Named-Entity Recognition with Transfer Learning. , 2018, Studies in health technology and informatics.

[34]  Yizhao Ni,et al.  An end-to-end hybrid algorithm for automated medication discrepancy detection , 2015, BMC Medical Informatics and Decision Making.

[35]  Jules J. Berman,et al.  Confidentiality issues for medical data miners , 2002, Artif. Intell. Medicine.

[36]  E. Alpern,et al.  Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement , 2016, Applied Clinical Informatics.

[37]  Steven Horng,et al.  Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning , 2017, PloS one.

[38]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[39]  Jonathan M. Garibaldi,et al.  A hybrid model for automatic identification of risk factors for heart disease , 2015, J. Biomed. Informatics.

[40]  Rui Zhang,et al.  Discovering and identifying New York heart association classification from electronic health records , 2018, BMC Medical Informatics and Decision Making.

[41]  Rui Zhang,et al.  Using natural language processing methods to classify use status of dietary supplements in clinical notes , 2018, BMC Medical Informatics and Decision Making.

[42]  Manabu Torii,et al.  Risk factor detection for heart disease by applying text analytics in electronic medical records , 2015, J. Biomed. Informatics.

[43]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[44]  Yue Wang,et al.  Clinical Word Sense Disambiguation with Interactive Search and Classification , 2016, AMIA.

[45]  Chengyu Wang,et al.  Improving Clinical Named Entity Recognition with Global Neural Attention , 2018, APWeb/WAIM.

[46]  Yu Cheng,et al.  Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes , 2018, J. Am. Medical Informatics Assoc..

[47]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[48]  S. Soderland,et al.  Automatic structuring of radiology free-text reports. , 2001, Radiographics : a review publication of the Radiological Society of North America, Inc.

[49]  Xiaolong Wang,et al.  Automatic de-identification of electronic medical records using token-level and character-level conditional random fields , 2015, J. Biomed. Informatics.

[50]  Steven H. Brown,et al.  VistA - U.S. Department of Veterans Affairs national-scale HIS , 2003, Int. J. Medical Informatics.

[51]  William K. Thompson,et al.  A Machine Learning Algorithm for Identifying Atopic Dermatitis in Adults from Electronic Health Records , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[52]  Sanda M. Harabagiu,et al.  Multi-modal Patient Cohort Identification from EEG Report and Signal Data , 2016, AMIA.

[53]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[54]  Andrew D. Brown,et al.  Using machine learning for sequence-level automated MRI protocol selection in neuroradiology , 2018, J. Am. Medical Informatics Assoc..

[55]  Daniel L. Rubin,et al.  Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) Utilizing Free-Text Clinical Narratives , 2018, Scientific Reports.

[56]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[57]  Mazen Alobaidi,et al.  Prediction of venous thromboembolism using semantic and sentiment analyses of clinical narratives , 2018, Comput. Biol. Medicine.

[58]  Reed McEwan,et al.  Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data , 2016, AMIA.

[59]  Jonathan M. Garibaldi,et al.  Automatic detection of protected health information from clinic narratives , 2015, J. Biomed. Informatics.

[60]  Hanna Suominen,et al.  Automatic detection of patients with invasive fungal disease from free-text computed tomography (CT) scans , 2015, J. Biomed. Informatics.

[61]  Yaoyun Zhang,et al.  A hybrid approach to automatic de-identification of psychiatric notes. , 2017, Journal of biomedical informatics.

[62]  Li Zhou,et al.  Automated misspelling detection and correction in clinical free-text records , 2015, J. Biomed. Informatics.

[63]  Ming Yang,et al.  Entity recognition from clinical texts via recurrent neural network , 2017, BMC Medical Informatics and Decision Making.

[64]  Hongfang Liu,et al.  An Infinite Mixture Model for Coreference Resolution in Clinical Notes , 2016, CRI.

[65]  Joseph Geraci,et al.  Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression , 2017, Evidence Based Journals.

[66]  Olga Patterson,et al.  Classifying the Indication for Colonoscopy Procedures: A Comparison of NLP Approaches in a Diverse National Healthcare System , 2015, MedInfo.

[67]  Anthony N. Nguyen,et al.  Active learning: a step towards automating medical concept extraction , 2015, J. Am. Medical Informatics Assoc..

[68]  Goran Nenadic,et al.  Text mining of cancer-related information: Review of current status and future directions , 2014, Int. J. Medical Informatics.

[69]  Goran Nenadic,et al.  Combining knowledge- and data-driven methods for de-identification of clinical narratives , 2015, J. Biomed. Informatics.

[70]  Zhiyong Lu,et al.  Challenges in clinical natural language processing for automated disorder normalization , 2015, J. Biomed. Informatics.

[71]  Sumithra Velupillai,et al.  Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing , 2018, Scientific Reports.

[72]  C. Langlotz,et al.  Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield. , 2017, AJR. American journal of roentgenology.

[73]  Andrew Y. Ng,et al.  Transfer learning for text classification , 2005, NIPS.

[74]  Michele Filannino,et al.  De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1. , 2017, Journal of biomedical informatics.

[75]  J. Rumsfeld,et al.  Insights from advanced analytics at the Veterans Health Administration. , 2014, Health affairs.

[76]  Hong-Jie Dai,et al.  Identification and Progression of Heart Disease Risk Factors in Diabetic Patients from Longitudinal Electronic Health Records , 2015, BioMed research international.

[77]  Goran Nenadic,et al.  Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes. , 2017, Journal of biomedical informatics.

[78]  Kevin J O'Leary,et al.  Creating a better discharge summary: improvement in quality and timeliness using an electronic discharge summary. , 2009, Journal of hospital medicine.

[79]  Fusheng Wang,et al.  Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies , 2017, JMIR medical informatics.

[80]  Meliha Yetisgen-Yildiz,et al.  Tumor reference resolution and characteristic extraction in radiology reports for liver cancer stage prediction , 2016, J. Biomed. Informatics.

[81]  Daniel L Rubin,et al.  A data warehouse for integrating radiologic and pathologic data. , 2008, Journal of the American College of Radiology : JACR.

[82]  Özlem Uzuner,et al.  Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus , 2015, J. Biomed. Informatics.

[83]  James Pustejovsky,et al.  SemEval-2015 Task 6: Clinical TempEval , 2015, *SEMEVAL.

[84]  Guan Wang,et al.  A method for systematic discovery of adverse drug events from clinical notes , 2015, J. Am. Medical Informatics Assoc..

[85]  Samah Jamal Fodeh,et al.  Electronic approaches to making sense of the text in the adverse event reporting system. , 2016, Journal of healthcare risk management : the journal of the American Society for Healthcare Risk Management.

[86]  Shankar Vembu,et al.  Using the Electronic Medical Record to Identify Patients at High Risk for Frequent Emergency Department Visits and High System Costs. , 2017, The American journal of medicine.

[87]  Amir M. Tahmasebi,et al.  Automatic Normalization of Anatomical Phrases in Radiology Reports Using Unsupervised Learning , 2018, Journal of Digital Imaging.

[88]  Mark Kramer,et al.  Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record , 2017, J. Am. Medical Informatics Assoc..

[89]  Hongfang Liu,et al.  Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions , 2017, J. Am. Medical Informatics Assoc..

[90]  Timothy Baldwin,et al.  Robust Training under Linguistic Adversity , 2017, EACL.

[91]  Franck Dernoncourt,et al.  Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives , 2018, PloS one.

[92]  Sosuke Kobayashi,et al.  Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations , 2018, NAACL.

[93]  Ramkiran Gouripeddi,et al.  Enhancing Comparative Effectiveness Research With Automated Pediatric Pneumonia Detection in a Multi-Institutional Clinical Repository: A PHIS+ Pilot Study , 2017, Journal of medical Internet research.

[94]  Hongfang Liu,et al.  Detection of clinically important colorectal surgical site infection using Bayesian network. , 2017, The Journal of surgical research.

[95]  Özlem Uzuner,et al.  Emerging clinical applications of text analytics , 2020, Int. J. Medical Informatics.

[96]  Chris Callison-Burch,et al.  Crowd control: Effectively utilizing unscreened crowd workers for biomedical data annotation , 2017, J. Biomed. Informatics.

[97]  Hong Yu,et al.  Bidirectional RNN for Medical Event Detection in Electronic Health Records , 2016, NAACL.

[98]  Madeleine Udell,et al.  Discovering Patient Phenotypes Using Generalized Low Rank Models , 2016, PSB.

[99]  Saeed Hassanpour,et al.  Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing , 2017, Journal of Digital Imaging.

[100]  Adi V. Gundlapalli,et al.  General Symptom Extraction from VA Electronic Medical Notes , 2017, MedInfo.

[101]  Özlem Uzuner,et al.  Annotating risk factors for heart disease in clinical narratives for diabetic patients , 2015, J. Biomed. Informatics.

[102]  Hong Yu,et al.  Assessing the Readability of Medical Documents: A Ranking Approach , 2018, JMIR medical informatics.

[103]  Harry Hochheiser,et al.  NLPReViz: an interactive tool for natural language processing on clinical text , 2018, J. Am. Medical Informatics Assoc..

[104]  Xiaolong Wang,et al.  De-identification of clinical notes via recurrent neural network and conditional random field. , 2017, Journal of biomedical informatics.

[105]  Walter Daelemans,et al.  Counting trees in Random Forests: Predicting symptom severity in psychiatric intake reports. , 2017, Journal of biomedical informatics.

[106]  Jihad S. Obeid,et al.  Word2Vec inversion and traditional text classifiers for phenotyping lupus , 2017, BMC Medical Informatics and Decision Making.

[107]  Halil Kilicoglu,et al.  The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs , 2015, J. Biomed. Informatics.

[108]  Ion Stoica,et al.  Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules , 2019, ICML.

[109]  Jonathan P. Bickel,et al.  Developing an Algorithm to Detect Early Childhood Obesity in Two Tertiary Pediatric Medical Centers , 2016, Applied Clinical Informatics.

[110]  Tung Tran,et al.  Predicting mental conditions based on "history of present illness" in psychiatric notes with deep neural networks. , 2017, Journal of biomedical informatics.

[111]  Saeed Hassanpour,et al.  Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository , 2016, Journal of Digital Imaging.

[112]  P. Hinds,et al.  Automated Outcome Classification of Computed Tomography Imaging Reports for Pediatric Traumatic Brain Injury. , 2016, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[113]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[114]  Shang Gao,et al.  Hierarchical attention networks for information extraction from cancer pathology reports , 2017, J. Am. Medical Informatics Assoc..

[115]  Joshua C Denny,et al.  Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals , 2017, J. Am. Medical Informatics Assoc..

[116]  Massimo Piccardi,et al.  Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition , 2017, J. Biomed. Informatics.

[117]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[118]  Ye Ye,et al.  A study of the transferability of influenza case detection systems between two large healthcare systems , 2017, PloS one.

[119]  Goran Nenadic,et al.  Automatic mining of symptom severity from psychiatric evaluation notes , 2017, International journal of methods in psychiatric research.

[120]  Lingyun Shi,et al.  Predictive modeling for classification of positive valence system symptom severity from initial psychiatric evaluation records. , 2017, Journal of biomedical informatics.

[121]  Ossama Tawfik,et al.  Integrating pathology and radiology disciplines: an emerging opportunity? , 2012, BMC Medicine.

[122]  Daniel L. Rubin,et al.  Intelligent Word Embeddings of Free-Text Radiology Reports , 2017, AMIA.

[123]  Monika Ahuja,et al.  Quality of EHR data extractions for studies of preterm birth in a tertiary care center: guidelines for obtaining reliable data , 2016, BMC Pediatrics.

[124]  Hua Xu,et al.  Data from clinical notes: a perspective on the tension between structure and flexible documentation , 2011, J. Am. Medical Informatics Assoc..

[125]  Saeed Hassanpour,et al.  Artificial Intelligence in Medicine , 2015 .

[126]  Peter Szolovits,et al.  Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text , 2015, J. Am. Medical Informatics Assoc..

[127]  Joseph Mesterhazy,et al.  Automatic Determination of the Need for Intravenous Contrast in Musculoskeletal MRI Examinations Using IBM Watson’s Natural Language Processing Algorithm , 2018, Journal of Digital Imaging.

[128]  Shamkant B. Navathe,et al.  Identifying Patients with Depression Using Free-text Clinical Documents , 2015, MedInfo.

[129]  Peter Hamilton,et al.  Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction , 2016, Artif. Intell. Medicine.

[130]  Daniel Jurafsky,et al.  Data Noising as Smoothing in Neural Network Language Models , 2017, ICLR.

[131]  Yinan Cui,et al.  Background and Significance , 2019, Dislocation Mechanism-Based Crystal Plasticity.

[132]  Cynthia Brandt,et al.  Classification of radiology reports for falls in an HIV study cohort , 2016, J. Am. Medical Informatics Assoc..

[133]  A. Quyyumi,et al.  Cohort profile: the Emory Cardiovascular Biobank (EmCAB) , 2017, BMJ Open.

[134]  M Sevenster,et al.  Natural Language Processing Techniques for Extracting and Categorizing Finding Measurements in Narrative Radiology Reports , 2015, Applied Clinical Informatics.

[135]  Kavishwar B. Wagholikar,et al.  Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach , 2017, BMC Medical Informatics and Decision Making.

[136]  Rui Dai,et al.  Classifying medical relations in clinical text via convolutional neural networks , 2018, Artif. Intell. Medicine.

[137]  Olga V. Patterson,et al.  Measuring Use of Evidence Based Psychotherapy for Posttraumatic Stress Disorder in a Large National Healthcare System , 2018, Administration and Policy in Mental Health and Mental Health Services Research.

[138]  Jun Xu,et al.  A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD) , 2017, J. Am. Medical Informatics Assoc..

[139]  Ankur Agarwal,et al.  A Natural Language Processing Framework for Assessing Hospital Readmissions for Patients With COPD , 2018, IEEE Journal of Biomedical and Health Informatics.

[140]  Rachel Davis,et al.  Automatic classification of RDoC positive valence severity with a neural network. , 2017, Journal of biomedical informatics.

[141]  Meliha Yetisgen-Yildiz,et al.  Tumor information extraction in radiology reports for hepatocellular carcinoma patients , 2016, CRI.

[142]  Shyam Visweswaran,et al.  Automated annotation and classification of BI-RADS assessment from radiology reports , 2017, J. Biomed. Informatics.

[143]  S. Meystre,et al.  Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry , 2018, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[144]  Hongfang Liu,et al.  Need of informatics in designing interoperable clinical registries , 2017, Int. J. Medical Informatics.

[145]  Chunye Wang,et al.  A Hybrid Approach to Extracting Disorder Mentions from Clinical Notes , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[146]  Sanda M. Harabagiu,et al.  Automatic recognition of symptom severity from psychiatric evaluation records. , 2017, Journal of biomedical informatics.

[147]  Steven Bethard,et al.  Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning , 2016, J. Am. Medical Informatics Assoc..

[148]  R. Barzilay,et al.  Machine Learning Methods to Extract Documentation of Breast Cancer Symptoms From Electronic Health Records. , 2018, Journal of pain and symptom management.