Information extraction from pathology reports in a hospital setting

As more health data becomes available, information extraction aims to make an impact on the workflows of hospitals and care centers. One of the targeted areas is the management of pathology reports, which are employed for cancer diagnosis and staging. In this work we integrate text mining tools in the workflow of the Royal Melbourne Hospital, to extract information from pathology reports with minimal expert intervention. Our framework relies on coarse-grained annotation (at document level), making it highly portable. Our evaluation shows that the kind of language used in these reports makes it feasible to extract information with high precision and recall, by means of state-of-the-art classification methods, and feature engineering.

[1]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[2]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[3]  Anthony N. Nguyen,et al.  Symbolic rule-based classification of lung cancer stages from free-text pathology reports , 2010, J. Am. Medical Informatics Assoc..

[4]  Robert A. Jenders,et al.  A systematic literature review of automated clinical coding and classification systems , 2010, J. Am. Medical Informatics Assoc..

[5]  R C Newland,et al.  Terminology and classification of colorectal adenocarcinoma: the Australian clinico-pathological staging system. , 1983, The Australian and New Zealand journal of surgery.

[6]  A. Nguyen,et al.  Multi-class Classification of Cancer Stages from Free-text Histology Reports using Support Vector Machines , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[7]  James W. Cooper,et al.  Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.

[8]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[9]  Anthony N. Nguyen,et al.  Application of Information Technology: Collection of Cancer Stage Data by Classifying Free-text Medical Reports , 2007, J. Am. Medical Informatics Assoc..

[10]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[11]  Ian Witten,et al.  Data Mining , 2000 .