In-depth annotation for patient level liver cancer staging

Cancer stages, which summarizes extent of cancer progression, is an important tool for evidence-based medical research. However, they are not always recorded in the electronic medical record. In this paper, we describe work for annotating a medical text corpus with the goal of predicting patient level liver cancer staging in hepatocellular carcinoma (HCC) patients. Our annotation consisted of identifying 11 parameters, used to calculate liver cancer staging, at the text span level as well as at the patient level. Also at the patient level, we annotated stages for three commonly-used liver cancer staging schemes. Our inter-rater agreement showed text annotation consistency 0.73 F1 for partial text match and 0.91 F1 at the patient level. After annotation, we performed several document classification experiments for the text span annotations using standard machine learning classifiers, including decision trees, maximum entropy, naive Bayes and support vector machines. Thereby, we identified baseline performances for our task at 0.63 F1 as well as strategies for future improvement.

[1]  A Burgun,et al.  Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer , 2011, Methods of Information in Medicine.

[2]  J Crook,et al.  Capturing tumour stage in a cancer information database. , 1998, Cancer prevention & control : CPC = Prevention & controle en cancerologie : PCC.

[3]  F. Lai,et al.  Information extraction for tracking liver cancer patients' statuses: from mixture of clinical narrative report types. , 2013, Telemedicine journal and e-health : the official journal of the American Telemedicine Association.

[4]  John E. Grayson Python and Tkinter Programming , 2000 .

[5]  Jin Wook Chung,et al.  Asian Consensus Workshop Report: Expert Consensus Guideline for the Management of Intermediate and Advanced Hepatocellular Carcinoma in Asia , 2011, Oncology.

[6]  M. Lock,et al.  Controversies in prostate cancer staging implementation at a tertiary cancer center. , 2006, The Canadian journal of urology.

[7]  R. Ozols Ovarian Cancer: American Cancer Society Atlas of Clinical Oncology , 2003 .

[8]  A. Nguyen,et al.  Multi-class Classification of Cancer Stages from Free-text Histology Reports using Support Vector Machines , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[9]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Yue Li,et al.  Information extraction from pathology reports in a hospital setting , 2011, CIKM '11.

[12]  Lisa Dahm,et al.  University of California, Irvine–Pathology Extraction Pipeline: The pathology extraction pipeline for information extraction from pathology reports , 2014, Health Informatics J..

[13]  Eric B. Durbin,et al.  Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[14]  Cody Schrank,et al.  American Cancer Society , 2005 .

[15]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[16]  Anthony N. Nguyen,et al.  Application of Information Technology: Collection of Cancer Stage Data by Classifying Free-text Medical Reports , 2007, J. Am. Medical Informatics Assoc..

[17]  J. Hornaday,et al.  Cancer Facts & Figures 2004 , 2004 .

[18]  Jon Patrick,et al.  Automatic population of structured reports from narrative pathology reports , 2014 .

[19]  James W. Cooper,et al.  Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.

[20]  F. Carrilho,et al.  Diagnosis, staging and treatment of hepatocellular carcinoma. , 2004, Brazilian journal of medical and biological research = Revista brasileira de pesquisas medicas e biologicas.

[21]  S. Orsulic,et al.  Ovarian Cancer , 1993, British Journal of Cancer.

[22]  Anthony N. Nguyen,et al.  Symbolic rule-based classification of lung cancer stages from free-text pathology reports , 2010, J. Am. Medical Informatics Assoc..

[23]  Arlene Chan,et al.  Accuracy of the oncology patients information system in a regional cancer centre. , 2002, Oncology reports.

[24]  Lei Liu,et al.  Extracting important information from Chinese Operation Notes with natural language processing methods , 2014, J. Biomed. Informatics.

[25]  K. McGlynn,et al.  The global epidemiology of hepatocellular carcinoma: present and future. , 2011, Clinics in liver disease.

[26]  Y. Sirivatanauksorn,et al.  Comparison of Staging Systems of Hepatocellular Carcinoma , 2011, HPB surgery : a world journal of hepatic, pancreatic and biliary surgery.