Clinical Natural Language Processing in 2014: Foundational Methods Supporting Efficient Healthcare

OBJECTIVE To summarize recent research and present a selection of the best papers published in 2014 in the field of clinical Natural Language Processing (NLP). METHOD A systematic review of the literature was performed by the two section editors of the IMIA Yearbook NLP section by searching bibliographic databases with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. A shortlist of candidate best papers was first selected by the section editors before being peer-reviewed by independent external reviewers. RESULTS The clinical NLP best paper selection shows that the field is tackling text analysis methods of increasing depth. The full review process highlighted five papers addressing foundational methods in clinical NLP using clinically relevant texts from online forums or encyclopedias, clinical texts from Electronic Health Records, and included studies specifically aiming at a practical clinical outcome. The increased access to clinical data that was made possible with the recent progress of de-identification paved the way for the scientific community to address complex NLP problems such as word sense disambiguation, negation, temporal analysis and specific information nugget extraction. These advances in turn allowed for efficient application of NLP to clinical problems such as cancer patient triage. Another line of research investigates online clinically relevant texts and brings interesting insight on communication strategies to convey health-related information. CONCLUSIONS The field of clinical NLP is thriving through the contributions of both NLP researchers and healthcare professionals interested in applying NLP techniques for concrete healthcare purposes. Clinical NLP is becoming mature for practical applications with a significant clinical impact.

[1]  Dave deBronkart How the e-patient community helped save my life: an essay by Dave deBronkart , 2013, BMJ : British Medical Journal.

[2]  Chengyi Zheng,et al.  Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results , 2013, World Journal of Urology.

[4]  John Yen,et al.  Identifying Emotional and Informational Support in Online Health Communities , 2014, COLING.

[5]  David Sánchez,et al.  Utility-preserving privacy protection of textual healthcare documents , 2014, J. Biomed. Informatics.

[6]  Jimeng Sun,et al.  Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records , 2014, Int. J. Medical Informatics.

[7]  Tapio Salakoski,et al.  Predicting patient acuity from electronic patient records , 2014, J. Biomed. Informatics.

[8]  Anna Rumshisky,et al.  Research and applications: Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods , 2014, J. Am. Medical Informatics Assoc..

[9]  Maria Skeppstedt,et al.  Synonym extraction and abbreviation expansion with ensembles of semantic spaces , 2014, Journal of Biomedical Semantics.

[10]  Sampo Pyysalo,et al.  Generalising semantic category disambiguation with large lexical resources for fun and profit , 2014, J. Biomed. Semant..

[11]  Hua Xu,et al.  Research and applications: Assisted annotation of medical free text using RapTAT , 2014, J. Am. Medical Informatics Assoc..

[12]  Tapio Salakoski,et al.  Statistical parsing of varieties of clinical Finnish , 2014, Artif. Intell. Medicine.

[13]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[14]  Fang Liu,et al.  QNOTE: an instrument for measuring the quality of EHR clinical notes , 2014, J. Am. Medical Informatics Assoc..

[15]  Louise Deléger,et al.  Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements , 2013, J. Am. Medical Informatics Assoc..

[16]  Martijn J. Schuemie,et al.  ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus , 2014, BMC Bioinformatics.

[17]  Lei Liu,et al.  Extracting important information from Chinese Operation Notes with natural language processing methods , 2014, J. Biomed. Informatics.

[18]  Hua Xu,et al.  Research and applications: A comprehensive study of named entity recognition in Chinese clinical text , 2014, J. Am. Medical Informatics Assoc..

[19]  Hongfang Liu,et al.  Research and applications: MedXN: an open source medication extraction and normalization tool for clinical text , 2014, J. Am. Medical Informatics Assoc..

[20]  Nigam H. Shah,et al.  Mining clinical text for signals of adverse drug-drug interactions , 2014, J. Am. Medical Informatics Assoc..

[21]  James J. Masanz,et al.  Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing , 2014, PloS one.

[22]  Eric Fosler-Lussier,et al.  Cross-narrative Temporal Ordering of Medical Events , 2014, ACL.

[23]  Danielle L. Mowery,et al.  Cue-based assertion classification for Swedish clinical text - Developing a lexicon for pyConTextSwe , 2014, Artif. Intell. Medicine.

[24]  Lynette Hirschman,et al.  De-identification of clinical narratives through writing complexity measures , 2014, Int. J. Medical Informatics.

[25]  J Bouaud,et al.  Toward a Formalization of the Process to Select IMIA Yearbook Best Papers , 2014, Methods of Information in Medicine.

[26]  Keith Marsolo,et al.  Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research , 2014, J. Biomed. Informatics.

[27]  Cyril Grouin,et al.  De-identification of clinical notes in French: towards a protocol for reference corpus development , 2014, J. Biomed. Informatics.

[28]  Graeme Hirst,et al.  Using statistical parsing to detect agrammatic aphasia , 2014, BioNLP@ACL.

[29]  Stéphane M. Meystre,et al.  Text de-identification for privacy protection: A study of its impact on clinical text information content , 2014, J. Biomed. Informatics.

[30]  Guergana K. Savova,et al.  Discovering body site and severity modifiers in clinical texts , 2013, AMIA.

[31]  Scott R. Halgrim,et al.  Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. , 2014, American journal of epidemiology.

[32]  Shahram Ebadollahi,et al.  Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record. , 2014, Journal of cardiac failure.

[33]  Pengcheng Shi,et al.  Towards multimodal modeling of physicians’ diagnostic confidence and self-awareness using medical narratives , 2014, COLING.

[34]  S. Duvall,et al.  Automated identification of patients with a diagnosis of binge eating disorder from narrative electronic health records. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[35]  Maria Kvist,et al.  Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study , 2014, J. Biomed. Informatics.