Validation and clinical discovery demonstration of a real-world data extraction platform

Objective: To validate and demonstrate the clinical discovery utility of a novel patient-mediated, medical record collection and data extraction platform developed to improve access and utilization of real-world clinical data. Methods: Clinical variables were extracted from the medical records of consented patients with metastatic breast cancer. To validate the extracted data, case report forms completed using the structured data output of the platform were compared to manual chart review for 50 patients. To demonstrate the platform's clinical discovery utility, we assessed associations between time to distant metastasis (TDM) and tumor histology, molecular type, and germline BRCA status in the platform-extracted data of 194 patients. Results: The platform-extracted data had 97.6% precision (91.98%-100% by variable type) and 81.48% recall (58.15%-95.00% by variable type) compared to manual chart review. In our discovery cohort, the shortest TDM was significantly associated with metaplastic (739.0 days) and inflammatory histologies (1,005.8 days), HR-/HER2- molecular types (1,187.4 days), and positive BRCA status (1,042.5 days) as compared to other histologies, molecular types, and negative BRCA status, respectively. Multivariable analyses did not produce statistically significant results, but the average TDMs are reported. Discussion: The platform-extracted clinical data are precise and comprehensive. The data can generate clinically-relevant insights. Conclusion: The structured real-world data produced by a patient-mediated, medical record-extraction platform are reliable and can power clinical discovery. Keywords: data accuracy; electronic health records; real-world data; real-world evidence

[1]  F. Griesinger,et al.  Use of algorithms for identifying patients in a German claims database: learnings from a lung cancer case , 2022, BMC Health Services Research.

[2]  L. Celi,et al.  Best practices in the real-world data life cycle , 2022, PLOS digital health.

[3]  J. Lund,et al.  An overview of real‐world data sources for oncology and considerations for research , 2021, CA: a cancer journal for clinicians.

[4]  T. Egberts,et al.  Identifying adverse drug reactions from free‐text electronic hospital health record notes , 2021, British journal of clinical pharmacology.

[5]  Tao Huang,et al.  Metaplastic breast cancer: Treatment and prognosis by molecular subtype , 2021, Translational oncology.

[6]  C. DesRoches,et al.  Open Notes in Oncology: Patient versus Oncology Clinician Views. , 2020, Cancer cell.

[7]  Laura M. Holdsworth,et al.  “Along for the Ride”: A Qualitative Study Exploring Patient and Caregiver Perceptions of Decision Making in Cancer Care , 2020, MDM policy & practice.

[8]  Saturnino Luz,et al.  A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis , 2019, Int. J. Medical Informatics.

[9]  Jacqueline Corrigan-Curay,et al.  Real-World Evidence and Real-World Data for Evaluating Drug Safety and Effectiveness. , 2018, JAMA.

[10]  C. Anders,et al.  Understanding patterns of brain metastasis in breast cancer and designing rational therapeutic strategies. , 2018, Annals of translational medicine.

[11]  V. Willey,et al.  A validation of clinical data captured from a novel Cancer Care Quality Program directly integrated with administrative claims data , 2017, Pragmatic and observational research.

[12]  A. Abernethy,et al.  Use of Electronic Health Record Data for Quality Reporting. , 2017, Journal of oncology practice.

[13]  M. Girolami,et al.  Analysis of free text in electronic health records for identification of cancer patient trajectories , 2017, Scientific Reports.