Artificial Intelligence in Medicine

OBJECTIVE A major source of information available in electronic health record (EHR) systems are the clinical free text notes documenting patient care. Managing this information is time-consuming for clinicians. Automatic text summarisation could assist clinicians in obtaining an overview of the free text information in ongoing care episodes, as well as in writing final discharge summaries. We present a study of automated text summarisation of clinical notes. It looks to identify which methods are best suited for this task and whether it is possible to automatically evaluate the quality differences of summaries produced by different methods in an efficient and reliable way. METHODS AND MATERIALS The study is based on material consisting of 66,884 care episodes from EHRs of heart patients admitted to a university hospital in Finland between 2005 and 2009. We present novel extractive text summarisation methods for summarising the free text content of care episodes. Most of these methods rely on word space models constructed using distributional semantic modelling. The summarisation effectiveness is evaluated using an experimental automatic evaluation approach incorporating well-known ROUGE measures. We also developed a manual evaluation scheme to perform a meta-evaluation on the ROUGE measures to see if they reflect the opinions of health care professionals. RESULTS The agreement between the human evaluators is good (ICC=0.74, p<0.001), demonstrating the stability of the proposed manual evaluation method. Furthermore, the correlation between the manual and automated evaluations are high (> 0.90 Spearman's rho). Three of the presented summarisation methods ('Composite', 'Case-Based' and 'Translate') significantly outperform the other methods for all ROUGE measures (p<0.05, Wilcoxon signed-rank test and Bonferroni correction). CONCLUSION The results indicate the feasibility of the automated summarisation of care episodes. Moreover, the high correlation between manual and automated evaluations suggests that the less labour-intensive automated evaluations can be used as a proxy for human evaluations when developing summarisation methods. This is of significant practical value for summarisation method development, because manual evaluation cannot be afforded for every variation of the summarisation methods. Instead, one can resort to automatic evaluation during the method development process.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  D. Baker,et al.  Deficits in communication and information transfer between hospital-based and primary care physicians: implications for patient safety and continuity of care. , 2007, JAMA.

[3]  A. Fahad,et al.  Intelligent Integration of Discharge Summary: A Formative Model , 2013, 2013 4th International Conference on Intelligent Systems, Modelling and Simulation.

[4]  N. Ghaboosi,et al.  A Path Relinking Approach for Delay-Constrained Least-Cost Multicast Routing Problem , 2007 .

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Maria Kvist,et al.  Modeling human comprehension of Swedish medical records for intelligent access and summarization systems - Future vision, a physician's perspective , 2011 .

[7]  Daniel M. Stein,et al.  Assessing Data Relevance For Automated Generation Of A Clinical Summary , 2007, AMIA.

[8]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[9]  Pavel Brazdil,et al.  TEXT SUMMARIZATION: USING CENTRALITY IN THE PATHFINDER NETWORK , 2007 .

[10]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[11]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[12]  A Simons,et al.  Evaluation of computer generated neonatal discharge summaries. , 1991, Archives of disease in childhood.

[13]  Trevor Cohen,et al.  Empirical distributional semantics: Methods and biomedical applications , 2009, J. Biomed. Informatics.

[14]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[15]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[16]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[17]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[18]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[19]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[20]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[21]  Rickard Cöster,et al.  Dynamic Lexica for Query Translation , 2004, CLEF.

[22]  Maria Skeppstedt,et al.  Synonym extraction and abbreviation expansion with ensembles of semantic spaces , 2014, Journal of Biomedical Semantics.

[23]  Tapio Salakoski,et al.  Care Episode Retrieval , 2014, Louhi@EACL.

[24]  D. Cicchetti Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. , 1994 .

[25]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[26]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[27]  Niladri Chatterjee,et al.  Extraction-Based Single-Document Summarization Using Random Indexing , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[28]  Guido Zuccon,et al.  An evaluation of corpus-driven measures of medical concept similarity for information retrieval , 2012, CIKM '12.

[29]  D. Lindberg,et al.  Unified Medical Language System , 2020, Definitions.

[30]  Ann Lehman JMP for basic univariate and multivariate statistics : a step-by-step guide , 2005 .

[31]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[32]  Subhankar Ghosh,et al.  Text summarization using Wikipedia , 2014, Inf. Process. Manag..

[33]  Michael Elhadad,et al.  Redundancy-Aware Topic Modeling for Patient Record Notes , 2014, PloS one.

[34]  M. Nasipuri,et al.  Using Machine Learning for Medical Document Summarization , 2011 .

[35]  Tapio Salakoski,et al.  On Evaluation of Automatically Generated Clinical Discharge Summaries , 2014, PAHI.

[36]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[37]  Guido Zuccon,et al.  Medical Semantic Similarity with a Neural Language Model , 2014, CIKM.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Shuhua Liu Experiences with and Reflections on Text Summarization Tools , 2009, Int. J. Comput. Intell. Syst..

[40]  Mario Lenz,et al.  Textual CBR , 1998, Case-Based Reasoning Technology.

[41]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[42]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.

[43]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[44]  Guilherme Del Fiol,et al.  Text summarization in the biomedical domain: A systematic review of recent research , 2014, J. Biomed. Informatics.

[45]  Øystein Nytrø,et al.  Does the Electronic Patient Record Support the Discharge Process? A Study on Physicians' Use of Clinical Information Systems during Discharge of Patients with Coronary Heart Disease , 2006, Health information management : journal of the Health Information Management Association of Australia.

[46]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[47]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[48]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[49]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[50]  G. Walton,et al.  Information overload within the health care system: a literature review. , 2004, Health information and libraries journal.

[51]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[52]  Ricky K. Taira,et al.  Automatic generation of repeated patient information for tailoring clinical notes , 2005, Int. J. Medical Informatics.

[53]  Jong-Hyeok Lee,et al.  Sentence Extraction Using Time Features in Multi-document Summarization , 2004, AIRS.

[54]  Maria Kvist,et al.  Fine-Grained Certainty Level Annotations Used for Coarser-Grained E-Health Scenarios - Certainty Classification of Diagnostic Statements in Swedish Clinical Text , 2012, CICLing.

[55]  Xiaojun Wan,et al.  Improved Affinity Graph Based Multi-Document Summarization , 2006, NAACL.

[56]  Zellig S. Harris,et al.  Distributional Structure , 1954 .