Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications

This article describes information retrieval, natural language processing and text mining of electronic patient record text, also called clinical text. Clinical text is written by physicians and nurses to document the health care process of the patient. First we describe some characteristics of clinical text, followed by the automatic preprocessing of the text that is necessary for making it usable for some applications. We also describe some applications for clinicians including spelling and grammar checking, ICD-10 diagnosis code assignment, as well as other applications for hospital management such as ICD-10 diagnosis code validation and detection of adverse events such as hospital acquired infections. Part of the preprocessing makes the clinical text useful for faceted search, although clinical text already has some keys for performing faceted search such as gender, age, ICD-10 diagnosis codes, ATC drug codes, etc. Preprocessing makes use of ICD-10 codes and the SNOMED-CT textual descriptions. ICD-10 codes and SNOMED-CT are available in several languages and can be considered the modern Greek or Latin of medical language. The basic research presented here has its roots in the challenges described by the health care sector. These challenges have been partially solved in academia, and we believe the solutions will be adapted to the health care sector in real world applications.

[1]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[2]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[3]  Hercules Dalianis,et al.  Aggregation in Natural Language Generation , 1999 .

[4]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[5]  Heljä Lundgrén-Laine,et al.  Characteristics of Finnish and Swedish intensive care nursing narratives: a comparative analysis to support the development of clinical language technologies , 2011, J. Biomed. Semant..

[6]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[7]  Sumithra Velupillai,et al.  Shades of Certainty: Annotation and Classification of Swedish Medical Records , 2012 .

[8]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[9]  Fredric C. Gey,et al.  Combining Query Translation and Document Translation in Cross-Language Retrieval , 2003, CLEF.

[10]  Stefan Schulz,et al.  Morpheme-based, cross-lingual indexing for medical document retrieval , 2000, Int. J. Medical Informatics.

[11]  Anne Henry,et al.  Linköping University Electronic Press , 2012 .

[12]  Hercules Dalianis,et al.  Evaluating a Spelling Support in a Search Engine , 2002, NLDB.

[13]  Wendy W. Chapman,et al.  Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[14]  Maria Kvist,et al.  Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text , 2012, LREC.

[15]  Ola Knutsson,et al.  Improving Precision in Information Retrieval for Swedish using Stemming , 2001, NODALIDA.

[16]  Antoine Geissbühler,et al.  Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record , 2003, Artif. Intell. Medicine.

[17]  Ted Pedersen,et al.  Abbreviation and Acronym Disambiguation in Clinical Discourse , 2005, AMIA.

[18]  S. Meystre,et al.  Automatic de-identification of textual documents in the electronic health record: a review of recent research , 2010, BMC medical research methodology.

[19]  T. H. Kyaw,et al.  Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database* , 2011, Critical care medicine.

[20]  Carol Peters,et al.  Comparative Evaluation of Multilingual Information Access Systems , 2003, Lecture Notes in Computer Science.

[21]  Hercules Dalianis Improving search engine retrieval using a compound splitter for Swedish , 2005, NODALIDA.

[22]  Peiling Wang,et al.  Mining longitudinal web queries: Trends and patterns , 2003, J. Assoc. Inf. Sci. Technol..

[23]  Jon D. Patrick,et al.  Automated Proof Reading of Clinical Notes , 2011, PACLIC.

[24]  Hercules Dalianis,et al.  Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus , 2010, NeSp-NLP@ACL.

[25]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[26]  Martin Hassel,et al.  Optimizing the Dimensionality of Clinical Term Spaces for Improved Diagnosis Coding Support , 2013 .

[27]  J. Groopman,et al.  How Doctors Think , 2007 .

[28]  H Humphreys,et al.  Prevalence surveys of healthcare-associated infections: what do they tell us, if anything? , 2006, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[29]  Hercules Dalianis,et al.  Detection of Hospital Acquired Infections in sparse and noisy Swedish patient records : A machine learning approach using Naïve Bayes, Support Vector Machines and C4.5 , 2012 .

[30]  A Charlett,et al.  Advances in electronic surveillance for healthcare-associated infections in the 21st Century: a systematic review. , 2013, The Journal of hospital infection.

[31]  Maria Kvist,et al.  Initial Results in the Development of SCAN A Swedish Clinical Abbreviation Normalizer , 2012, CLEF.

[32]  Cynthia Brandt,et al.  Improving Patients' Electronic Health Record Comprehension with NoteAid , 2013, MedInfo.

[33]  Stephen Tomlinson Experiments in 8 European Languages with Hummingbird SearchServer™ at CLEF2002 , 2002, CLEF.

[34]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[35]  Marianne Tinnå,et al.  IHI Global Trigger Tool for Measuring Adverse Events (Second Edition) , 2015 .

[36]  Maria Kvist,et al.  Professional Language in Swedish Radiology Reports - Characterization for Patient-Adapted Text Simplification , 2013 .

[37]  Robert A. Jenders,et al.  A systematic literature review of automated clinical coding and classification systems , 2010, J. Am. Medical Informatics Assoc..

[38]  Hercules Dalianis,et al.  Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike , 2009, ACL.

[39]  Hercules Dalianis,et al.  Detection of Spelling Errors in Swedish Clinical Text , 2014 .

[40]  Maria Skeppstedt,et al.  Negation detection in Swedish clinical text: An adaption of NegEx to Swedish , 2011, J. Biomed. Semant..

[41]  H. Dalianis,et al.  The Stockholm EPR Corpus – Characteristics and Some Initial Findings , 2009 .

[42]  Maria Kvist,et al.  Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study , 2014, J. Biomed. Informatics.

[43]  Lars Ulrik Gerdes,et al.  Text Mining Electronic Health Records to Identify Hospital Adverse Events , 2013, MedInfo.

[44]  M. Gardner,et al.  Information retrieval for patient care , 1997, BMJ.

[45]  W. Bilker,et al.  Validation studies of the health improvement network (THIN) database for pharmacoepidemiology research , 2007, Pharmacoepidemiology and drug safety.

[46]  Stéfan Jacques Darmoni,et al.  Architecture and Systems for Monitoring Hospital Acquired Infections inside Hospital Information Workflows , 2011 .

[47]  Hercules Dalianis,et al.  Stockholm EPR Corpus : A Clinical Database Used to Improve Health Care , 2012 .

[48]  Wilson Wong,et al.  Statistical semantic and clinician confidence analysis for correcting abbreviations and spelling errors in clinical progress notes , 2011, Artif. Intell. Medicine.