Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing

Objectives: To summarize recent research and present a selection of the best papers published in 2016 in the field of clinical Natural Language Processing (NLP). Method: A survey of the literature was performed by the two section editors of the IMIA Yearbook NLP section. Bibliographic databases were searched for papers with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Papers were automatically ranked and then manually reviewed based on titles and abstracts. A shortlist of candidate best papers was first selected by the section editors before being peer-reviewed by independent external reviewers. Results: The five clinical NLP best papers provide a contribution that ranges from emerging original foundational methods to transitioning solid established research results to a practical clinical setting. They offer a framework for abbreviation disambiguation and coreference resolution, a classification method to identify clinically useful sentences, an analysis of counseling conversations to improve support to patients with mental disorder and grounding of gradable adjectives. Conclusions: Clinical NLP continued to thrive in 2016, with an increasing number of contributions towards applications compared to fundamental methods. Fundamental work addresses increasingly complex problems such as lexical semantics, coreference resolution, and discourse analysis. Research results translate into freely available tools, mainly for English.

[1]  Pierre Zweigenbaum,et al.  Tri Automatique de la Littérature pour les Revues Systématiques (Automatically Ranking the Literature in Support of Systematic Reviews) , 2017, JEPTALNRECITAL.

[2]  M. Levy,et al.  ReCAP: Feasibility and Accuracy of Extracting Cancer Stage Information From Narrative Electronic Health Record Data. , 2016, Journal of oncology practice.

[3]  Yevgeniy Vorobeychik,et al.  Optimizing annotation resources for natural language de-identification via a game theoretic framework , 2016, J. Biomed. Informatics.

[4]  P. Hinds,et al.  Automated Outcome Classification of Computed Tomography Imaging Reports for Pediatric Traumatic Brain Injury. , 2016, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[5]  Kirk Roberts,et al.  Interactive use of online health resources: a comparison of consumer and professional questions , 2016, J. Am. Medical Informatics Assoc..

[6]  Reed McEwan,et al.  Corpus domain effects on distributional semantic modeling of medical terms , 2016, Bioinform..

[7]  Halil Kilicoglu,et al.  Sortal anaphora resolution to enhance relation extraction from biomedical literature , 2016, BMC Bioinformatics.

[8]  Peer Bork,et al.  The SIDER database of drugs and side effects , 2015, Nucleic Acids Res..

[9]  Guy Divita,et al.  Detecting the presence of an indwelling urinary catheter and urinary symptoms in hospitalized patients using natural language processing. , 2017, Journal of biomedical informatics.

[10]  Joshua C Denny,et al.  Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals , 2017, J. Am. Medical Informatics Assoc..

[11]  Kai Zheng,et al.  Assessing the readability of ClinicalTrials.gov , 2016, J. Am. Medical Informatics Assoc..

[12]  Jure Leskovec,et al.  Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health , 2016, TACL.

[13]  Carol Friedman,et al.  Towards a comprehensive medical language processing system: methods and issues , 1997, AMIA.

[14]  Wei Chen,et al.  The utility of including pathology reports in improving the computational identification of patients , 2016, Journal of pathology informatics.

[15]  Liam Peyton,et al.  A unified framework for evaluating the risk of re-identification of text de-identification tools , 2016, J. Biomed. Informatics.

[16]  Jeff Appelbaum,et al.  Cardiac catheterization laboratory inpatient forecast tool: a prospective evaluation , 2016, J. Am. Medical Informatics Assoc..

[17]  Eric Fosler-Lussier,et al.  Identification, characterization, and grounding of gradable terms in clinical text , 2016, BioNLP@ACL.

[18]  Byron C. Wallace,et al.  Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision , 2016, J. Mach. Learn. Res..

[19]  Regina Barzilay,et al.  Using machine learning to parse breast pathology reports , 2016, bioRxiv.

[20]  Meliha Yetisgen-Yildiz,et al.  Tumor reference resolution and characteristic extraction in radiology reports for liver cancer stage prediction , 2016, J. Biomed. Informatics.

[21]  Wei Wang,et al.  Decision support environment for medical product safety surveillance , 2016, J. Biomed. Informatics.

[22]  Jeffrey W. Pennington,et al.  Temporal bone radiology report classification using open source machine learning and natural langue processing libraries , 2016, BMC Medical Informatics and Decision Making.

[23]  G Savova,et al.  Capturing the Patient’s Perspective: a Review of Advances in Natural Language Processing of Health-Related Text , 2017, Yearbook of Medical Informatics.

[24]  Jake Luo,et al.  Discovering Outliers of Potential Drug Toxicities Using a Large-scale Data-driven Approach , 2016, Cancer informatics.

[25]  Dingcheng Li,et al.  Toward a Learning Health-care System – Knowledge Delivery at the Point of Care Empowered by Big Data and NLP , 2016, Biomedical informatics insights.

[26]  Devore S. Culver,et al.  Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language Processing–Based Algorithm With Statewide Electronic Medical Records , 2016, JMIR medical informatics.

[27]  Rosa L. Figueroa,et al.  Extracting Information from Electronic Medical Records to Identify the Obesity Status of a Patient Based on Comorbidities and Bodyweight Measures , 2016, Journal of Medical Systems.

[28]  Yulia A. Strekalova,et al.  Language of Uncertainty: the Expression of Decisional Conflict Related to Skin Cancer Prevention Recommendations , 2017, Journal of Cancer Education.

[29]  Christina Eldredge,et al.  Population Analysis of Adverse Events in Different Age Groups Using Big Clinical Trials Data , 2016, JMIR medical informatics.

[30]  Ravishankar K. Iyer,et al.  Adverse Events in Robotic Surgery: A Retrospective Study of 14 Years of FDA Data , 2015, PloS one.

[31]  Peter J. Richardson,et al.  Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing , 2016, Medical care.

[32]  D. Asch,et al.  Health Affairs Of The Patient Experience Of Care Yelp Reviews Of Hospital Care Can Supplement And Inform Traditional Surveys , 2016 .

[33]  Selen Bozkurt,et al.  Using automatically extracted information from mammography reports for decision-support , 2016, J. Biomed. Informatics.

[34]  Maria W. G. Nijhuis-van der Sanden,et al.  Data extraction from electronic health records (EHRs) for quality measurement of the physical therapy process: comparison between EHR data and survey data , 2016, BMC Medical Informatics and Decision Making.

[35]  Kavita Radhakrishnan,et al.  Studying Associations Between Heart Failure Self-Management and Rehospitalizations Using Natural Language Processing , 2017, Western journal of nursing research.

[36]  Chengyi Zheng,et al.  Extracting and analyzing ejection fraction values from electronic echocardiography reports in a large health maintenance organization , 2017, Health Informatics J..

[37]  Mike Conway,et al.  Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notes , 2016, BioNLP@ACL.

[38]  R. Scott Evans,et al.  Automated Identification and Predictive Tools to Help Identify High-risk Heart Failure Patients , 2016, AMIA.

[39]  Lynette Hirschman,et al.  Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification , 2016, Methods of Information in Medicine.

[40]  Dong Wen,et al.  Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models , 2016, J. Biomed. Informatics.

[41]  Rachel E. Ginn,et al.  Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter , 2016, Drug Safety.

[42]  Daniel Fabbri,et al.  Natural Language Processing for Cohort Discovery in a Discharge Prediction Model for the Neonatal ICU , 2016, Applied Clinical Informatics.

[43]  Yaoyun Zhang,et al.  Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov , 2016, J. Am. Medical Informatics Assoc..

[44]  Peter Szolovits,et al.  Surrogate-assisted feature extraction for high-throughput phenotyping , 2016, J. Am. Medical Informatics Assoc..

[45]  Bruce E. Bray,et al.  Congestive heart failure information extraction framework for automated treatment performance measures assessment , 2017, J. Am. Medical Informatics Assoc..

[46]  Aurélie Névéol,et al.  Automatic classification of registered clinical trials towards the Global Burden of Diseases taxonomy of diseases and injuries , 2016, BMC Bioinformatics.

[47]  David Milward,et al.  Developing timely insights into comparative effectiveness research with a text-mining pipeline. , 2016, Drug discovery today.

[48]  Andrew Hackbarth,et al.  Defining a Patient Population With Cirrhosis: An Automated Algorithm With Natural Language Processing , 2016, Journal of clinical gastroenterology.

[49]  Loes M. M. Braun,et al.  Natural Language Processing in Radiology: A Systematic Review. , 2016, Radiology.

[50]  Priya Nambisan,et al.  Using Social Media Data to Identify Potential Candidates for Drug Repurposing: A Feasibility Study , 2016, JMIR research protocols.

[51]  Enrique Baca-García,et al.  Novel Use of Natural Language Processing (NLP) to Predict Suicidal Ideation and Psychiatric Symptoms in a Text-Based Mental Health Intervention in Madrid , 2016, Comput. Math. Methods Medicine.

[52]  Saeed Hassanpour,et al.  Predicting High Imaging Utilization Based on Initial Radiology Reports: A Feasibility Study of Machine Learning. , 2016, Academic radiology.

[53]  Kalpana Raja,et al.  Classification of clinically useful sentences in clinical evidence resources , 2016, J. Biomed. Informatics.

[54]  Rob Koeling,et al.  What evidence is there for a delay in diagnostic coding of rheumatoid arthritis in UK general practice records? An observational study of free text , 2016 .

[55]  Tianxi Cai,et al.  Large-scale identification of patients with cerebral aneurysms using natural language processing , 2016, Neurology.

[56]  Thomas E. Elliott,et al.  Enhancing Risk Assessment in Patients Receiving Chronic Opioid Analgesic Therapy Using Natural Language Processing , 2016, Pain medicine.

[57]  Dina Demner-Fushman,et al.  Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text , 2016, PloS one.

[58]  Jonathan P. Bickel,et al.  Developing an Algorithm to Detect Early Childhood Obesity in Two Tertiary Pediatric Medical Centers , 2016, Applied Clinical Informatics.

[59]  Adam Wright,et al.  Measuring patient-perceived quality of care in US hospitals using Twitter , 2015, BMJ Quality & Safety.

[60]  Jun Xu,et al.  A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD) , 2017, J. Am. Medical Informatics Assoc..

[61]  David Kauchak,et al.  Moving Beyond Readability Metrics for Health-Related Text Simplification , 2016, IT Professional.

[62]  Ricky K. Taira,et al.  Automatic Classification of Ultrasound Screening Examinations of the Abdominal Aorta , 2016, Journal of Digital Imaging.

[63]  Hongfang Liu,et al.  An Infinite Mixture Model for Coreference Resolution in Clinical Notes , 2016, CRI.

[64]  Andrew D Brown,et al.  A Natural Language Processing-based Model to Automate MRI Brain Protocol Selection and Prioritization. , 2017, Academic radiology.

[65]  D. G. Clark,et al.  Novel verbal fluency scores and structural brain imaging for prediction of cognitive outcome in mild cognitive impairment , 2016, Alzheimer's & dementia.

[66]  Mike Conway,et al.  Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis , 2016, Journal of Biomedical Semantics.

[67]  Diego Klabjan,et al.  A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism , 2016, JMIR medical informatics.

[68]  Chen Lin,et al.  Multilayered temporal modeling for the clinical domain , 2016, J. Am. Medical Informatics Assoc..

[69]  Girish Chavan,et al.  NOBLE – Flexible concept recognition for large-scale biomedical natural language processing , 2016, BMC Bioinformatics.

[70]  Masaki Mori,et al.  Unraveling the linguistic nature of specific autobiographical memories using a computerized classification algorithm , 2017, Behavior research methods.

[71]  E. Alpern,et al.  Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement , 2016, Applied Clinical Informatics.

[72]  Dimitrios Mitsouras,et al.  Natural Language Processing Technologies in Radiology Research and Clinical Applications. , 2016, Radiographics : a review publication of the Radiological Society of North America, Inc.

[73]  Son Doan,et al.  Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics , 2016, Biomedical informatics insights.

[74]  M. Ghassemi,et al.  Predicting early psychiatric readmission with natural language processing of narrative discharge summaries , 2016, Translational psychiatry.

[75]  T. McCoy,et al.  Improving Prediction of Suicide and Accidental Death After Discharge From General Hospitals With Natural Language Processing. , 2016, JAMA psychiatry.

[76]  John S Brownstein,et al.  Publicly Available Online Tool Facilitates Real-Time Monitoring Of Vaccine Conversations And Sentiments. , 2016, Health affairs.

[77]  Keyuan Jiang,et al.  Construction of a Personal Experience Tweet Corpus for Health Surveillance , 2016, BioNLP@ACL.

[78]  Huilong Duan,et al.  Utilizing Chinese Admission Records for MACE Prediction of Acute Coronary Syndrome , 2016, International journal of environmental research and public health.

[79]  Yijun Shao,et al.  Identifying Axial Spondyloarthritis in Electronic Medical Records of US Veterans , 2016, Arthritis care & research.

[80]  Megan L Ranney,et al.  Tweet Now, See You In the ED Later? Examining the Association Between Alcohol-related Tweets and Emergency Care Visits. , 2016, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[81]  Barbara Sheehan,et al.  Natural Language Processing–Enabled and Conventional Data Capture Methods for Input to Electronic Health Records: A Comparative Usability Study , 2016, JMIR medical informatics.

[82]  Raja Mazumder,et al.  DiMeX: A Text Mining System for Mutation-Disease Association Extraction , 2016, PloS one.

[83]  Angus Roberts,et al.  Identifying First Episodes of Psychosis in Psychiatric Patient Records using Machine Learning , 2016, BioNLP@ACL.

[84]  Matthew C Keifer,et al.  Epidemiologic trends in medically-attended tree stand fall injuries among Wisconsin deer hunters. , 2016, Injury.

[85]  Brett South,et al.  The use of natural language processing on narrative medication schedules to compute average weekly dose , 2016, Pharmacoepidemiology and drug safety.

[86]  Jorie M. Butler,et al.  Determining Multiple Sclerosis Phenotype from Electronic Medical Records , 2016, Journal of managed care & specialty pharmacy.

[87]  Louis-Philippe Morency,et al.  A Machine Learning Approach to Identifying the Thought Markers of Suicidal Subjects: A Prospective Multicenter Trial , 2017, Suicide & life-threatening behavior.

[88]  Jing Liu,et al.  An ensemble method for extracting adverse drug events from social media , 2016, Artif. Intell. Medicine.

[89]  Ann D. Smith,et al.  Text paging of surgery residents: Efficacy, work intensity, and quality improvement. , 2016, Surgery.

[90]  Uri Kartoun,et al.  Development and Validation of an Algorithm to Identify Nonalcoholic Fatty Liver Disease in the Electronic Medical Record , 2016, Digestive Diseases and Sciences.

[91]  Bradley S Peterson,et al.  Semantic mapping reveals distinct patterns in descriptions of social relations in adults with autism spectrum disorder , 2016, Autism research : official journal of the International Society for Autism Research.

[92]  Nigel Collier,et al.  Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation , 2016, ACL.