Modeling cancer clinical trials using HL7 FHIR to support downstream applications: A case study with colorectal cancer data

BACKGROUND AND OBJECTIVE Identification and Standardization of data elements used in clinical trials may control and reduce the cost and errors during the operational process, and enable seamless data exchange between the electronic data capture (EDC) systems and Electronic Health Record (EHR) systems. This study presents a methodology to comprehensively capture the clinical trial data element needs. MATERIALS AND METHODS Case report forms (CRF) for clinical trial data collection were used to approximate the clinical information need, whereby these information needs were then mapped to a semantically equivalent field within an existing FHIR cancer profile. For items without a semantically equivalent field, we considered these items to be information needs that cannot be represented in current standards and proposed extensions to support these needs. RESULTS We successfully identified 62 discrete items from a preliminary survey of 43 base questions in four CRFs used in colorectal cancer clinical trials, in which 28 items are modeled with FHIR extensions and their associated responses for colorectal cancer. We achieved promising results in the data population of the CRFs with average Precision 98.5 %, Recall 96.2 %, and F-measure 96.8 % for all base questions. We also demonstrated the auto-filled answers in CRFs can be used to discover patient subgroups using a topic modeling approach. CONCLUSION CRFs can be considered as a proxy for representing information needs for their respective cancer types. Mining the information needs can serve as a valuable resource for expanding existing standards to ensure they can comprehensively represent relevant clinical data without loss of granularity.

[1]  Khaled El Emam,et al.  The Use of Electronic Data Capture Tools in Clinical Trials: Web-Survey of 259 Canadian Trials , 2009, Journal of medical Internet research.

[2]  P M Nadkarni,et al.  The Common Data Elements for Cancer Research: Remarks on Functions and Structure , 2006, Methods of Information in Medicine.

[3]  Dingcheng Li,et al.  Toward a Learning Health-care System – Knowledge Delivery at the Point of Care Empowered by Big Data and NLP , 2016, Biomedical informatics insights.

[4]  Geetha Subramaniam,et al.  Common data elements for substance use disorders in electronic health records: the NIDA Clinical Trials Network experience. , 2013, Addiction.

[5]  Binny Krishnankutty,et al.  Basics of case report form designing in clinical research , 2014, Perspectives in clinical research.

[6]  Dat Quoc Nguyen jLDADMM: A Java package for the LDA and DMM topic models , 2018, ArXiv.

[7]  J. Srigley,et al.  Standardized synoptic cancer pathology reporting: A population‐based approach , 2009, Journal of surgical oncology.

[8]  T. Ganslandt,et al.  Common data elements for secondary use of electronic health record data for clinical trial execution and serious adverse event reporting , 2016, BMC Medical Research Methodology.

[9]  Carl F. Pieper,et al.  Quantifying Data Quality for Clinical Trials Using Electronic Data Capture , 2008, PloS one.

[10]  Chirag J Patel,et al.  A standard database for drug repositioning , 2017, Scientific Data.

[11]  Ramon Bataller,et al.  Standard Definitions and Common Data Elements for Clinical Trials in Patients With Alcoholic Hepatitis: Recommendation From the NIAAA Alcoholic Hepatitis Consortia. , 2016, Gastroenterology.

[12]  John Silva,et al.  The Common Data Element Dictionary - a standard nomenclature for the reporting of Phase 3 cancer clinical trial data , 2001, Proceedings 14th IEEE Symposium on Computer-Based Medical Systems. CBMS 2001.

[13]  Nansu Zong,et al.  Developing an FHIR-Based Computational Pipeline for Automatic Population of Case Report Forms for Colorectal Cancer Clinical Trials Using Electronic Health Records , 2020, JCO clinical cancer informatics.

[14]  D. Sargent,et al.  Effect of oxaliplatin, fluorouracil, and leucovorin with or without cetuximab on survival among patients with resected stage III colon cancer: a randomized trial. , 2012, JAMA.

[15]  David Moher,et al.  The Good Clinical Practice guideline: a bronze standard for clinical research , 2005, The Lancet.

[16]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.