Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis

OBJECTIVES We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis. METHODS We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers. RESULTS Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis. Three key clinical NLP subtasks that enable such analysis were identified: 1) developing more efficient methods for corpus creation (annotation and de-identification), 2) generating building blocks for extracting meaning (morphological, syntactic, and semantic subtasks), and 3) leveraging NLP for clinical utility (NLP applications and infrastructure for clinical use cases). Finally, we provide a reflection upon most recent developments and potential areas of future NLP development and applications. CONCLUSIONS There has been an increase of advances within key NLP subtasks that support semantic analysis. Performance of NLP semantic analysis is, in many cases, close to that of agreement between humans. The creation and release of corpora annotated with complex semantic information models has greatly supported the development of new tools and approaches. Research on non-English languages is continuously growing. NLP methods have sometimes been successfully employed in real-world clinical tasks. However, there is still a gap between the development of advanced resources and their utilization in clinical settings. A plethora of new clinical use cases are emerging due to established health care initiatives and additional patient-generated sources through the extensive use of social media and other devices.

[1]  Mike Conway,et al.  Extending the NegEx Lexicon for Multiple Languages , 2013, MedInfo.

[2]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[3]  James Pustejovsky,et al.  Automating Temporal Annotation with TARSQI , 2005, ACL.

[4]  Soo-Yong Shin,et al.  Lessons Learned from Development of De-identification System for Biomedical Research in a Korean Tertiary Hospital , 2013, Healthcare informatics research.

[5]  Chen Lin,et al.  Temporal Annotation in the Clinical Domain , 2014, TACL.

[6]  Hua Xu,et al.  Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[7]  Shuying Shen,et al.  Evaluating the state of the art in coreference resolution for electronic medical records , 2012, J. Am. Medical Informatics Assoc..

[8]  Anna Rumshisky,et al.  Temporal reasoning over clinical text: the state of the art , 2013, J. Am. Medical Informatics Assoc..

[9]  Michael Gertz,et al.  Multilingual and cross-domain temporal tagging , 2012, Language Resources and Evaluation.

[10]  David Sánchez,et al.  Utility-preserving privacy protection of textual healthcare documents , 2014, J. Biomed. Informatics.

[11]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[12]  Cyril Grouin,et al.  Optimizing annotation efforts to build reliable annotated corpora for training statistical models , 2014, LAW@COLING.

[13]  Goran Nenadic,et al.  Text mining of cancer-related information: Review of current status and future directions , 2014, Int. J. Medical Informatics.

[14]  Nigam H. Shah,et al.  Practice-Based Evidence: Profiling the Safety of Cilostazol by Text-Mining of Clinical Notes , 2013, PloS one.

[15]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[16]  Danielle L. Mowery,et al.  Medical diagnosis lost in translation – Analysis of uncertainty and negation expressions in English and Swedish clinical texts , 2012, BioNLP@HLT-NAACL.

[17]  Adam Wright,et al.  Use of a support vector machine for categorizing free-text notes: assessment of accuracy across two institutions , 2013, J. Am. Medical Informatics Assoc..

[18]  Cyril Grouin,et al.  De-identification of clinical notes in French: towards a protocol for reference corpus development , 2014, J. Biomed. Informatics.

[19]  David Juckett,et al.  A method for determining the number of documents needed for a gold standard corpus , 2012, J. Biomed. Informatics.

[20]  Eric S. Kirkendall,et al.  Phenotyping for patient safety: algorithm development for electronic health record based automated adverse event and medical error detection in neonatal intensive care , 2014, Journal of the American Medical Informatics Association : JAMIA.

[21]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[22]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[23]  James Pustejovsky,et al.  Natural Language Annotation for Machine Learning - a Guide to Corpus-Building for Applications , 2012 .

[24]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[25]  Shuying Shen,et al.  BoB, a best-of-breed automated text de-identification system for VHA clinical documents , 2013, J. Am. Medical Informatics Assoc..

[26]  Todd Lingren,et al.  Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing , 2013, Journal of medical Internet research.

[27]  James Pustejovsky,et al.  SemEval-2015 Task 6: Clinical TempEval , 2015, *SEMEVAL.

[28]  Eric Fosler-Lussier,et al.  Temporal Classification of Medical Events , 2012, BioNLP@HLT-NAACL.

[29]  Danielle L. Mowery,et al.  BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge , 2015, *SEMEVAL.

[30]  Joshua C. Denny,et al.  Detecting temporal expressions in medical narratives , 2013, Int. J. Medical Informatics.

[31]  Pierre Zweigenbaum,et al.  Didactic Panel : clinical Natural Language Processing in Languages Other Than English , 2014 .

[32]  Robert A. Jenders,et al.  A systematic literature review of automated clinical coding and classification systems , 2010, J. Am. Medical Informatics Assoc..

[33]  Sumithra Velupillai,et al.  De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields , 2010, J. Biomed. Semant..

[34]  Carol Friedman,et al.  Two biomedical sublanguages: a description based on the theories of Zellig Harris , 2002, J. Biomed. Informatics.

[35]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[36]  Eric Fosler-Lussier,et al.  Cross-narrative Temporal Ordering of Medical Events , 2014, ACL.

[37]  Danielle L. Mowery,et al.  Cue-based assertion classification for Swedish clinical text - Developing a lexicon for pyConTextSwe , 2014, Artif. Intell. Medicine.

[38]  Keith Marsolo,et al.  Large-scale evaluation of automated clinical note de-identification and its impact on information extraction , 2013, J. Am. Medical Informatics Assoc..

[39]  Dan Roth,et al.  Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives , 2013, J. Am. Medical Informatics Assoc..

[40]  Rodney D. Nielsen,et al.  Towards comprehensive syntactic and semantic annotations of the clinical narrative , 2013, J. Am. Medical Informatics Assoc..

[41]  Peter Szolovits,et al.  MCORES: a system for noun phrase coreference resolution for clinical records , 2012, J. Am. Medical Informatics Assoc..

[42]  Zhiyong Lu,et al.  Community challenges in biomedical text mining over 10 years: success, failure and the future , 2016, Briefings Bioinform..

[43]  Peter J. Haug,et al.  Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation , 2013, J. Am. Medical Informatics Assoc..

[44]  Stéphane M. Meystre,et al.  Text de-identification for privacy protection: A study of its impact on clinical text information content , 2014, J. Biomed. Informatics.

[45]  Kazuhiko Ohe,et al.  Extraction of Adverse Drug Effects from Clinical Records , 2010, MedInfo.

[46]  Hua Xu,et al.  Research and applications: Assisted annotation of medical free text using RapTAT , 2014, J. Am. Medical Informatics Assoc..

[47]  Galia Angelova,et al.  Automatic Analysis of Patient History Episodes in Bulgarian Hospital Discharge Letters , 2012, EACL.

[48]  M. Fava,et al.  Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model , 2011, Psychological Medicine.

[49]  Tapio Salakoski,et al.  Statistical parsing of varieties of clinical Finnish , 2014, Artif. Intell. Medicine.

[50]  Guy Divita,et al.  Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes , 2013 .

[51]  Shuying Shen,et al.  Validating a strategy for psychosocial phenotyping using a large corpus of clinical text. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[52]  Frank D. Wood,et al.  Diagnosis code assignment: models and evaluation metrics , 2013, J. Am. Medical Informatics Assoc..

[53]  Kai Zheng,et al.  Analyzing Differences between Chinese and English Clinical Text: A Cross-Institution Comparison of Discharge Summaries in Two Languages , 2016, MedInfo.

[54]  James Pustejovsky,et al.  ISO-TimeML: An International Standard for Semantic Annotation , 2010, LREC.

[55]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[56]  Prakash M. Nadkarni,et al.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions , 2011, J. Am. Medical Informatics Assoc..

[57]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[58]  Stéfan Jacques Darmoni,et al.  Architecture and Systems for Monitoring Hospital Acquired Infections inside Hospital Information Workflows , 2011 .

[59]  Guergana K. Savova,et al.  Discovering body site and severity modifiers in clinical texts , 2013, AMIA.

[60]  Wendy W. Chapman,et al.  Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm , 2011, J. Biomed. Informatics.

[61]  Shuying Shen,et al.  A Hybrid Stepwise Approach for De-identifying Person Names in Clinical Documents , 2012, BioNLP@HLT-NAACL.

[62]  K. Bretonnel Cohen,et al.  Sentiment Analysis of Suicide Notes: A Shared Task , 2012, Biomedical informatics insights.

[63]  Kevin Bretonnel Cohen,et al.  Biomedical Natural Language Processing , 2014 .

[64]  Galia Angelova,et al.  Closure Properties of Bulgarian Clinical Text , 2013, RANLP.

[65]  Pawel Matykiewicz,et al.  What’s In a Note: Construction of a Suicide Note Corpus , 2012, Biomedical informatics insights.

[66]  Hercules Dalianis,et al.  Pseudonymisation of Personal Names and other PHIs in an Annotated Clinical Swedish Corpus , 2012 .

[67]  S. Meystre,et al.  Automatic de-identification of textual documents in the electronic health record: a review of recent research , 2010, BMC medical research methodology.

[68]  Martijn J. Schuemie,et al.  ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus , 2014, BMC Bioinformatics.

[69]  Lynette Hirschman,et al.  De-identification of clinical narratives through writing complexity measures , 2014, Int. J. Medical Informatics.

[70]  Stéphane M. Meystre,et al.  Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text , 2014, J. Biomed. Informatics.

[71]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[72]  Lawrence Cavedon,et al.  Text mining for lung cancer cases over large patient admission data , 2014 .

[73]  Sonja Zillner,et al.  Identifying Pathological Findings in German Radiology Reports Using a Syntacto-semantic Parsing Approach , 2013, BioNLP@ACL.

[74]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[75]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[76]  Hongfang Liu,et al.  A common type system for clinical natural language processing , 2013, J. Biomed. Semant..

[77]  Pierre Zweigenbaum,et al.  A controlled greedy supervised approach for co-reference resolution on clinical text , 2013, J. Biomed. Informatics.

[78]  John Dunnion,et al.  Analyzing Patient Records to Establish If and When a Patient Suffered from a Medical Condition , 2012, BioNLP@HLT-NAACL.

[79]  Robert Eriksson,et al.  Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text , 2013, J. Am. Medical Informatics Assoc..

[80]  Noémie Elhadad,et al.  Automated methods for the summarization of electronic health records , 2015, J. Am. Medical Informatics Assoc..

[81]  Danielle L. Mowery,et al.  Task 2 : ShARe/CLEF eHealth Evaluation Lab 2014 , 2013 .

[82]  A Charlett,et al.  Advances in electronic surveillance for healthcare-associated infections in the 21st Century: a systematic review. , 2013, The Journal of hospital infection.

[83]  Cosmin Adrian Bejan,et al.  Identification of Patients with Acute Lung Injury from Free-Text Chest X-Ray Reports , 2013, BioNLP@ACL.

[84]  K. Bretonnel Cohen,et al.  Earlier Identification of Epilepsy Surgery Candidates Using Natural Language Processing , 2013, BioNLP@ACL.

[85]  Louise Deléger,et al.  Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements , 2013, J. Am. Medical Informatics Assoc..

[86]  Maria Kvist,et al.  Detecting Healthcare-Associated Infections in Electronic Health Records : Evaluation of Machine Learning and Preprocessing Techniques , 2014, SMBM 2014.

[87]  Maria Kvist,et al.  Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study , 2014, J. Biomed. Informatics.

[88]  David Martínez,et al.  Evaluating the state of the art in disorder recognition and normalization of the clinical narrative , 2014, J. Am. Medical Informatics Assoc..

[89]  Yohan Vetteth,et al.  Using electronic medical records to increase the efficiency of catheter-associated urinary tract infection surveillance for National Health and Safety Network reporting. , 2014, American journal of infection control.