Automatic Summary Creation by Applying Natural Language Processing on Unstructured Medical Records

In this paper we present a system for automatic generation of summaries of patients' unstructured medical reports. The system employs Natural Language Processing techniques in order to determine the most interesting points and uses the MetaMap module for recognizing the medical concepts in a medical report. Afterwards the sentences that do not contain interesting concepts are removed and a summary is generated which contains URL links to the Linked Life Data pages of the identified medical concepts, enabling both medical doctors and patients to further explore what is reported in. Such integration also allows the tool to interface with other semantic web-based applications. The performance of the tool were also evaluated, achieving remarkable results in sentence identification, polarity detection and concept recognition. Moreover, the accuracy of the generated summaries was evaluated by five medical doctors, proving that the summaries keep the same relevant information as the medical reports, despite being much more concise.

[1]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[2]  Jules J. Berman,et al.  Implementation and Evaluation of a Negation Tagger in a Pipeline-based System for Information Extraction from Pathology Reports , 2004, MedInfo.

[3]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[4]  Hyoil Han,et al.  Approaches to text mining for clinical medical records , 2006, SAC '06.

[5]  Anne F. Kittler,et al.  A cost-benefit analysis of electronic medical records in primary care. , 2003, The American journal of medicine.

[6]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.

[7]  Qinghua Zou,et al.  Modeling Medical Content for Automated Summarization , 2002, Annals of the New York Academy of Sciences.

[8]  Kazuhiko Ohe,et al.  TEXT2TABLE: Medical Text Summarization System Based on Named Entity Recognition and Modality Identification , 2009, BioNLP@HLT-NAACL.

[9]  Hyoil Han,et al.  Converting Semi-structured Clinical Medical Records into Information and Knowledge , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[10]  Yi-fang Brook Wu,et al.  Identifying important concepts from medical documents , 2006, J. Biomed. Informatics.

[11]  I Kavasidis,et al.  An integrated computer-controlled system for assisting researchers in cortical excitability studies by using transcranial magnetic stimulation , 2012, Comput. Methods Programs Biomed..

[12]  Nicoletta Calzolari,et al.  Multilingual Summarization by Integrating Linguistic Resources in the MLIS-MUSI Project , 2002, LREC.

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[15]  Concetto Spampinato,et al.  Discovering biological knowledge by integrating high‐throughput data and scientific literature on the cloud , 2014, Concurr. Comput. Pract. Exp..