论文信息 - De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository.

De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository.

Machine learning (ML)-driven computable phenotypes are among the most challenging to share and reproduce. Despite this difficulty, the urgent public health considerations around Long COVID make it especially important to ensure the rigor and reproducibility of Long COVID phenotyping algorithms such that they can be made available to a broad audience of researchers. As part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, researchers with the National COVID Cohort Collaborative (N3C) devised and trained an ML-based phenotype to identify patients highly probable to have Long COVID. Supported by RECOVER, N3C and NIH's All of Us study partnered to reproduce the output of N3C's trained model in the All of Us data enclave, demonstrating model extensibility in multiple environments. This case study in ML-based phenotype reuse illustrates how open-source software best practices and cross-site collaboration can de-black-box phenotyping algorithms, prevent unnecessary rework, and promote open science in informatics.

[1] Nephi A. Walton,et al. Characterizing variability of electronic health record-driven phenotype definitions , 2022, J. Am. Medical Informatics Assoc..

[2] Heidi Ledford. How common is long COVID? Why studies give different answers , 2022, Nature.

[3] K. Gersing,et al. Identifying who has long COVID in the USA: a machine learning approach using N3C data , 2022, The Lancet Digital Health.

[4] N. Savage. Breaking into the black box of artificial intelligence. , 2022, Nature.

[5] H. Lehmann,et al. Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative , 2021, J. Am. Medical Informatics Assoc..

[6] Philip R. O. Payne,et al. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment , 2020, J. Am. Medical Informatics Assoc..

[7] Sheila V. Kusnoor,et al. Diversity and inclusion for the All of Us research program: A scoping review , 2020, PloS one.

[8] A. Philippakis,et al. The "All of Us" Research Program. , 2019, The New England journal of medicine.

[9] Jesse Davis,et al. Learning from positive and unlabeled data: a survey , 2018, Machine Learning.

[10] Paul A. Harris,et al. Desiderata for computable representations of electronic health records-driven phenotype algorithms , 2015, J. Am. Medical Informatics Assoc..