On Evaluation of Automatically Generated Clinical Discharge Summaries

Proper evaluation is crucial for developing high-quality computerized text summarization systems. In the clinical domain, the specialized information needs of the clinicians complicates the task of evaluating automatically produced clinical text summaries. In this paper we present and compare the results from both manual and automatic evaluation of computer-generated summaries. These are composed of sentence extracts from the free text in clinical daily notes – corresponding to individual care episodes, written by physicians concerning patient care. The purpose of this study is primarily to find out if there is a correlation between the conducted automatic evaluation and the manual evaluation. We analyze which of the automatic evaluation metrics correlates the most with the scores from the manual evaluation. The manual evaluation is performed by domain experts who follow an evaluation tool that we developed as a part of this study. As a result, we hope to get some insight into the reliability of the selected approach to automatic evaluation. Ultimately this study can help us in assessing the reliability of this evaluation approach, so that we can further develop the underlying summarization system. The evaluation results seem promising in that the ranking order of the various summarization methods, ranked by all the automatic evaluation metrics, correspond well with that of the manual evaluation. These preliminary results also indicate that the utilized automatic evaluation setup can be used as an automated and reliable way to rank clinical summarization methods internally in terms of their performance.

[1]  Kimmo Koskenniemi,et al.  Two-Level Model for Morphological Analysis , 1983, IJCAI.

[2]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[3]  A Simons,et al.  Evaluation of computer generated neonatal discharge summaries. , 1991, Archives of disease in childhood.

[4]  Adam Wright,et al.  Summarization of clinical information: A conceptual model , 2011, J. Biomed. Informatics.

[5]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[6]  D. Baker,et al.  Deficits in communication and information transfer between hospital-based and primary care physicians: implications for patient safety and continuity of care. , 2007, JAMA.

[7]  Daniel M. Stein,et al.  Assessing Data Relevance For Automated Generation Of A Clinical Summary , 2007, AMIA.

[8]  D. Cicchetti Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. , 1994 .

[9]  Laura A. Slaughter,et al.  A Comparison of Several Key Information Visualization Systems for Secondary Use of Electronic Health Record Content , 2010, Louhi@NAACL-HLT.

[10]  Shuhua Liu Experiences with and Reflections on Text Summarization Tools , 2009, Int. J. Comput. Intell. Syst..

[11]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.

[12]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[13]  Ann Lehman JMP for basic univariate and multivariate statistics : a step-by-step guide , 2005 .

[14]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[15]  Fred Karlsson,et al.  Constraint Grammar as a Framework for Parsing Running Text , 1990, COLING.

[16]  G. Walton,et al.  Information overload within the health care system: a literature review. , 2004, Health information and libraries journal.

[17]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.