Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP

Natural language processing (NLP) technologies have been successfully applied to cancer research by enabling automated phenotypic information extraction from narratives in electronic health records (EHRs) such as pathology reports; however, developing customized NLP solutions requires substantial effort. To facilitate the adoption of NLP in cancer research, we have developed a set of customizable modules for extracting comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers), by leveraging the existing CLAMP system, which provides user-friendly interfaces for building customized NLP solutions for individual needs. Evaluation using annotated data at Vanderbilt University Medical Center showed that CLAMP-Cancer could extract diverse types of cancer information with good F-measures (0.80-0.98). We then applied CLAMP-Cancer to an information extraction task at Mayo Clinic and showed that we can quickly build a customized NLP system with comparable performance with an existing system at Mayo Clinic. CLAMP-Cancer is freely available for academic use.

[1]  M. Kalia,et al.  Biomarkers for personalized oncology: recent advances and future challenges. , 2015, Metabolism: clinical and experimental.

[2]  Christopher G. Chute,et al.  Technical Brief: Mayo Clinic NLP System for Patient Smoking Status Identification , 2008, J. Am. Medical Informatics Assoc..

[3]  Jeremy Warner,et al.  Electronic health records (EHRs): supporting ASCO's vision of cancer care. , 2014, American Society of Clinical Oncology educational book. American Society of Clinical Oncology. Annual Meeting.

[4]  Ronald Cornet,et al.  Natural language processing in pathology: a scoping review , 2016, Journal of Clinical Pathology.

[5]  Michael Feldman,et al.  caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research , 2010, J. Am. Medical Informatics Assoc..

[6]  Timothy A. Miller,et al.  DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. , 2017, Cancer research.

[7]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[8]  J. Srigley,et al.  Standardized synoptic cancer pathology reporting: A population‐based approach , 2009, Journal of surgical oncology.

[9]  Carol Friedman,et al.  Facilitating Cancer Research using Natural Language Processing of Pathology Reports , 2004, MedInfo.

[10]  James W. Cooper,et al.  Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.

[11]  Hongfang Liu,et al.  CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines , 2017, J. Am. Medical Informatics Assoc..

[12]  Y. Hoshida,et al.  Cancer biomarker discovery and validation. , 2015, Translational cancer research.

[13]  Goran Nenadic,et al.  Text mining of cancer-related information: Review of current status and future directions , 2014, Int. J. Medical Informatics.

[14]  Hongfang Liu,et al.  A Study of Transportability of an Existing Smoking Status Detection Module across Institutions , 2012, AMIA.