What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask (Preprint)

UNSTRUCTURED Coincident with the tsunami of COVID-19–related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.

[1]  Brett K. Beaulieu-Jones,et al.  International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium , 2020, npj Digital Medicine.

[2]  Sheng Yu,et al.  Feature extraction for phenotyping from semantic and knowledge resources , 2019, J. Biomed. Informatics.

[3]  Olaf Klungel,et al.  The reporting of studies conducted using observational routinely collected health data statement for pharmacoepidemiology (RECORD-PE) , 2018, British Medical Journal.

[4]  Brett K. Beaulieu-Jones,et al.  Reproducibility of computational workflows is automated using continuous analysis , 2017, Nature Biotechnology.

[5]  Paul A. Harris,et al.  PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability , 2016, J. Am. Medical Informatics Assoc..

[6]  L. Smeeth,et al.  Das RECORD-Statement zum Berichten von Beobachtungsstudien, die routinemäßig gesammelte Gesundheitsdaten verwenden , 2016, Zeitschrift fur Evidenz, Fortbildung und Qualitat im Gesundheitswesen.

[7]  Steven G. Johnson,et al.  A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data , 2016, EGEMS.

[8]  Kenneth D. Mandl,et al.  Data interchange using i2b2 , 2016, J. Am. Medical Informatics Assoc..

[9]  I. Kohane,et al.  Development of phenotype algorithms using electronic medical records and incorporating natural language processing , 2015, BMJ : British Medical Journal.

[10]  Jennifer M. Urban,et al.  Shining Light into Black Boxes , 2012, Science.

[11]  E. Johansson,et al.  Multi- and Megavariate Data Analysis: Finding and Using Regularities in Metabonomics Data , 2005 .

[12]  Iain E. Buchan,et al.  Trustworthy reuse of health data: A transnational perspective , 2013, Int. J. Medical Informatics.