Developing a Data and Analytics Platform to Enable a Breast Cancer Learning Health System at a Regional Cancer Center.

PURPOSE This study documents the creation of automated, longitudinal, and prospective data and analytics platform for breast cancer at a regional cancer center. This platform combines principles of data warehousing with natural language processing (NLP) to provide the integrated, timely, meaningful, high-quality, and actionable data required to establish a learning health system. METHODS Data from six hospital information systems and one external data source were integrated on a nightly basis by automated extract/transform/load jobs. Free-text clinical documentation was processed using a commercial NLP engine. RESULTS The platform contains 141 data elements of 7,019 patients with newly diagnosed breast cancer who received care at our regional cancer center from January 1, 2014, to June 3, 2022. Daily updating of the database takes an average of 56 minutes. Evaluation of the tuning of NLP jobs found overall high performance, with an F1 of 1.0 for 19 variables, with a further 16 variables with an F1 of > 0.95. CONCLUSION This study describes how data warehousing combined with NLP can be used to create a prospective data and analytics platform to enable a learning health system. Although upfront time investment required to create the platform was considerable, now that it has been developed, daily data processing is completed automatically in less than an hour.

[1]  Rachael V. Phillips,et al.  Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare , 2022, npj Digital Medicine.

[2]  Kate Black,et al.  Image Exchange in Canada: Examples from the Province of Ontario , 2022, Journal of Digital Imaging.

[3]  S. Jyrkkiö,et al.  Real-world data on diffuse large B-cell lymphoma in 2010-2019: usability of large data sets of Finnish hospital data lakes. , 2022, Future oncology.

[4]  S. Shah,et al.  Harnessing multimodal data integration to advance precision oncology , 2021, Nature Reviews Cancer.

[5]  F. Matheson,et al.  The 2011 and 2016 iterations of the Ontario Marginalization Index: updates, consistency and a cross-sectional study of health outcome associations , 2021, Canadian Journal of Public Health.

[6]  M. Xavier,et al.  Barriers and facilitators to implementing a continuing medical education intervention in a primary health care setting , 2021, BMC Health Services Research.

[7]  Timothy D. Solberg,et al.  An artificial intelligence framework integrating longitudinal electronic health records with real-world data enables continuous pan-cancer prognostication , 2021, Nature Cancer.

[8]  Chunhua Weng,et al.  Data Quality of Chemotherapy-Induced Nausea and Vomiting Documentation , 2021, Applied Clinical Informatics.

[9]  Harshana Liyanage,et al.  Quality assessment of real-world data repositories across the data life cycle: A literature review , 2021, J. Am. Medical Informatics Assoc..

[10]  C. Scaife,et al.  Granular neighborhood-level socioeconomic data: An opportunity for a different kind of precision oncology? , 2021, American journal of surgery.

[11]  David P. Miller,et al.  Advancing the learning health system by incorporating social determinants. , 2020, The American journal of managed care.

[12]  Qin Zhang,et al.  Extracting comprehensive clinical information for breast cancer using deep learning methods , 2019, Int. J. Medical Informatics.

[13]  Muhammad Mamdani,et al.  Extracting Clinical Features From Dictated Ambulatory Consult Notes Using a Commercially Available Natural Language Processing Tool: Pilot, Retrospective, Cross-Sectional Validation Study , 2019, JMIR medical informatics.

[14]  Tina Hernandez-Boussard,et al.  Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment , 2019, JAMIA open.

[15]  E. Bruera,et al.  The Edmonton Symptom Assessment System 25 Years Later: Past, Present, and Future Developments. , 2017, Journal of pain and symptom management.

[16]  Issam El Naqa,et al.  The big data effort in radiation oncology: Data mining or data farming? , 2016, Advances in radiation oncology.

[17]  Timothy N. Showalter,et al.  Big Data and Comparative Effectiveness Research in Radiation Oncology: Synergy and Accelerated Discovery , 2015, Front. Oncol..

[18]  Brian Hazlehurst,et al.  CER Hub: An informatics platform for conducting comparative effectiveness research using multi-institutional, heterogeneous, electronic clinical data , 2015, Int. J. Medical Informatics.

[19]  Chengyi Zheng,et al.  Second Prize: A Natural Language Processing Program Effectively Extracts Key Pathologic Findings from Radical Prostatectomy Reports , 2014 .

[20]  Paul A. Harris,et al.  Secondary use of clinical data: The Vanderbilt approach , 2014, J. Biomed. Informatics.

[21]  Sarah Kramer,et al.  Cancer Care Ontario's computerized physician order entry system: a province-wide patient safety innovation. , 2006, Healthcare quarterly.

[22]  OUP accepted manuscript , 2022, Journal of the American Medical Informatics Association.

[23]  Manisha Desai,et al.  Breast cancer treatment across health care systems: Linking electronic medical records and state registry data to enable outcomes research , 2014, Cancer.