Methods to improve the quality of smoking records in a primary care EMR database: exploring multiple imputation and pattern-matching algorithms

Background Primary care electronic medical record (EMR) data are emerging as a useful source for secondary uses, such as disease surveillance, health outcomes research, and practice improvement. These data capture clinical details about patients’ health status, as well as behavioural risk factors, such as smoking. While the importance of documenting smoking status in a healthcare setting is recognized, the quality of smoking data captured in EMRs is variable. This study was designed to test methods aimed at improving the quality of patient smoking information in a primary care EMR database. Methods EMR data from community primary care settings extracted by two regional practice-based research networks in Alberta, Canada were used. Patients with at least one encounter in the previous 2 years (2016–2018) and having hypertension according to a validated definition were included ( n  = 48,377). Multiple imputation was tested under two different assumptions for missing data (smoking status is missing at random and missing not-at-random). A third method tested a novel pattern matching algorithm developed to augment smoking information in the primary care EMR database. External validity was examined by comparing the proportions of smoking categories generated in each method with a general population survey. Results Among those with hypertension, 40.8% ( n  = 19,743) had either no smoking information recorded or it was not interpretable and considered missing. Those with missing smoking data differed statistically by demographics, clinical features, and type of EMR system used in the clinic. Both multiple imputation methods produced fully complete smoking status information, with the proportion of current smokers estimated at 25.3% (data missing at random) and 12.5% (data missing not-at-random). The pattern-matching algorithm classified 18.2% of patients as current smokers, similar to the population-based survey (18.9%), but still resulted in missing smoking information for 23.6% of patients. The algorithm was estimated to be 93.8% accurate overall, but varied by smoking status category. Conclusion Multiple imputation and algorithmic pattern-matching can be used to improve EMR data post-extraction but the recommended method depends on the purpose of secondary use (e.g. practice improvement or epidemiological analyses).

[1]  T. Gagné Estimation of smoking prevalence in Canada: Implications of survey characteristics in the CCHS and CTUMS/CTADS , 2017, Canadian Journal of Public Health.

[2]  L. Abroms,et al.  A content analysis of electronic health record (EHR) functionality to support tobacco treatment , 2017, Translational behavioral medicine.

[3]  R. Moineddin,et al.  Risk Adjustment Using Administrative Data-Based and Survey-Derived Methods for Explaining Physician Utilization , 2010, Medical care.

[4]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[5]  Hairong Yu,et al.  Structured data quality reports to improve EHR data quality , 2015, Int. J. Medical Informatics.

[6]  Robert A. Verheij,et al.  Improving the quality of EHR recording in primary care: a data quality feedback tool , 2017, J. Am. Medical Informatics Assoc..

[7]  Tra My Pham,et al.  Missing data and multiple imputation in clinical epidemiological research , 2017, Clinical epidemiology.

[8]  Lawrence A Leiter,et al.  Canadian Cardiovascular Harmonized National Guidelines Endeavour (C-CHANGE) guideline for the prevention and management of cardiovascular disease in primary care: 2018 update , 2018, Canadian Medical Association Journal.

[9]  R. Birtwhistle,et al.  Validation of an EMR algorithm to measure the prevalence of ADHD in the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) , 2020, BMC Medical Informatics and Decision Making.

[10]  J. Barnsley,et al.  Using a data entry clerk to improve data quality in primary care electronic medical records: a pilot study. , 2011, Informatics in primary care.

[11]  M. Ferreira,et al.  Efficacy and safety of paracetamol for spinal pain and osteoarthritis: systematic review and meta-analysis of randomised placebo controlled trials , 2015, BMJ : British Medical Journal.

[12]  Vasa Curcin,et al.  Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse , 2018, Journal of medical Internet research.

[13]  Tyler Williamson,et al.  Are We Asking Patients if They Smoke?: Missing Information on Tobacco Use in Canadian Electronic Medical Records. , 2015, American journal of preventive medicine.

[14]  Tyler Williamson,et al.  Data Resource Profile: National electronic medical record data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). , 2017, International journal of epidemiology.

[15]  I. Kohane,et al.  Development of phenotype algorithms using electronic medical records and incorporating natural language processing , 2015, BMJ : British Medical Journal.

[16]  Tyler Williamson,et al.  Validating the 8 CPCSSN Case Definitions for Chronic Disease Surveillance in a Primary Care Database of Electronic Health Records , 2014, The Annals of Family Medicine.

[17]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[18]  J. Carpenter,et al.  Issues in multiple imputation of missing data for large general practice clinical databases , 2010, Pharmacoepidemiology and drug safety.

[19]  I. White,et al.  Smoker, ex-smoker or non-smoker? The validity of routinely recorded smoking status in UK primary care: a cross-sectional study , 2014, BMJ Open.

[20]  Lisa Szatkowski,et al.  The impact of the Quality and Outcomes Framework (QOF) on the recording of smoking targets in primary care medical records: cross-sectional analyses from The Health Improvement Network (THIN) database , 2012, BMC Public Health.

[21]  F. McAlister,et al.  Factors associated with hypertension control among older Canadians. , 2018, Health reports.

[22]  Hude Quan,et al.  Achieving quality primary care data: a description of the Canadian Primary Care Sentinel Surveillance Network data capture, extraction, and processing in Alberta , 2019, International journal of population data science.