Development of an algorithm for determining smoking status and behaviour over the life course from UK electronic primary care records

BackgroundPatients’ smoking status is routinely collected by General Practitioners (GP) in UK primary health care. There is an abundance of Read codes pertaining to smoking, including those relating to smoking cessation therapy, prescription, and administration codes, in addition to the more regularly employed smoking status codes. Large databases of primary care data are increasingly used for epidemiological analysis; smoking status is an important covariate in many such analyses. However, the variable definition is rarely documented in the literature.MethodsThe Secure Anonymised Information Linkage (SAIL) databank is a repository for a national collection of person-based anonymised health and socio-economic administrative data in Wales, UK. An exploration of GP smoking status data from the SAIL databank was carried out to explore the range of codes available and how they could be used in the identification of different categories of smokers, ex-smokers and never smokers. An algorithm was developed which addresses inconsistencies and changes in smoking status recording across the life course and compared with recorded smoking status as recorded in the Welsh Health Survey (WHS), 2013 and 2014 at individual level. However, the WHS could not be regarded as a “gold standard” for validation.ResultsThere were 6836 individuals in the linked dataset. Missing data were more common in GP records (6%) than in WHS (1.1%). Our algorithm assigns ex-smoker status to 34% of never-smokers, and detects 30% more smokers than are declared in the WHS data. When distinguishing between current smokers and non-smokers, the similarity between the WHS and GP data using the nearest date of comparison was κ = 0.78. When temporal conflicts had been accounted for, the similarity was κ = 0.64, showing the importance of addressing conflicts.ConclusionsWe present an algorithm for the identification of a patient’s smoking status using GP self-reported data. We have included sufficient details to allow others to replicate this work, thus increasing the standards of documentation within this research area and assessment of smoking status in routine data.

[1]  Edeltraut Garbe,et al.  Risk of ischemic stroke in patients with Crohn's disease: A population‐based nested case‐control study† , 2010, Inflammatory bowel diseases.

[2]  M. Tremblay,et al.  The accuracy of self-reported smoking: a systematic review of the relationship between self-reported and cotinine-assessed smoking status. , 2009, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[3]  Anita Sharma Maximising Quality and Outcomes Framework Quality Points: The QOF Clinical Domain , 2011 .

[4]  T. Langley,et al.  Can primary care data be used to monitor regional smoking prevalence? An analysis of The Health Improvement Network primary care data , 2011, BMC public health.

[5]  J. Skala,et al.  Severe mental illness increases the risk of death from coronary heart disease or stroke , 2007, Evidence-based mental health.

[6]  Lisa Szatkowski,et al.  Can data from primary care medical records be used to monitor national smoking prevalence? , 2011, Journal of Epidemiology & Community Health.

[7]  H Jick,et al.  Validation of information recorded on general practitioner based computerised data resource in the United Kingdom. , 1991, BMJ.

[8]  Steve Caine,et al.  Developing a large electronic primary care database (Doctors' Independent Network) for research , 2004, Int. J. Medical Informatics.

[9]  R. Lyons,et al.  The SAIL Databank: building a national architecture for e-health research and evaluation , 2009, BMC health services research.

[10]  S. de Lusignan,et al.  An eight-step method for assessing diagnostic data quality in practice: chronic obstructive pulmonary disease as an exemplar. , 2004, Informatics in primary care.

[11]  Greta Rait,et al.  Panic disorder and risk of new onset coronary heart disease, acute myocardial infarction, and cardiac mortality: cohort study using the general practice research database. , 2008, European heart journal.

[12]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[13]  Kerina H. Jones,et al.  The SAIL databank: linking multiple health and social care datasets , 2009, BMC Medical Informatics Decis. Mak..

[14]  W. Bilker,et al.  Validation studies of the health improvement network (THIN) database for pharmacoepidemiology research , 2007, Pharmacoepidemiology and drug safety.

[15]  Lisa Szatkowski,et al.  Validation of The Health Improvement Network (THIN) primary care database for monitoring prescriptions for smoking cessation medications , 2010, Pharmacoepidemiology and drug safety.

[16]  Lisa Szatkowski,et al.  Is smoking status routinely recorded when patients register with a new GP? , 2010, Family practice.

[17]  P Scarborough,et al.  The burden of smoking-related ill health in the UK , 2009, Tobacco Control.

[18]  David Stables,et al.  QRESEARCH: a new general practice database for research. , 2004, Informatics in primary care.

[19]  Samy Suissa,et al.  Lifestyle variables and the risk of myocardial infarction in the General Practice Research Database , 2007, BMC cardiovascular disorders.

[20]  Irwin Nazareth,et al.  Relative risk of cardiovascular and cancer mortality in people with severe mental illness from the United Kingdom's General Practice Rsearch Database. , 2007, Archives of general psychiatry.

[21]  M. Gulliford,et al.  Validity of smoking prevalence estimates from primary care electronic health records compared with national population survey data for England, 2007 to 2011 , 2013, Pharmacoepidemiology and drug safety.

[22]  Colleen Brensinger,et al.  Agreement between GPRD smoking data: a survey of general practitioners and a population‐based survey , 2004, Pharmacoepidemiology and drug safety.

[23]  Daniel B. Shin,et al.  Prevalence of cardiovascular risk factors in patients with psoriasis. , 2006, Journal of the American Academy of Dermatology.

[24]  Dipak Kalra,et al.  Data Resource Profile: Cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER) , 2012, International journal of epidemiology.

[25]  Neil Dhoul,et al.  Quality of smoking data in GP computer systems in the UK , 2006 .

[26]  J. Chisholm,et al.  The Read clinical classification. , 1990, BMJ.

[27]  Mats Lindblad,et al.  Body mass, tobacco and alcohol and risk of esophageal, gastric cardia, and gastric non-cardia adenocarcinoma among men and women in a nested case-control study , 2005, Cancer Causes & Control.

[28]  Lisa Szatkowski,et al.  The impact of the Quality and Outcomes Framework (QOF) on the recording of smoking targets in primary care medical records: cross-sectional analyses from The Health Improvement Network (THIN) database , 2012, BMC Public Health.

[29]  M. Hernán,et al.  Cigarette smoking and the progression of multiple sclerosis. , 2005, Brain : a journal of neurology.

[30]  Tom Chan,et al.  Identifying patients with chronic kidney disease from general practice computer records. , 2005, Family practice.

[31]  R. Rosenfeld,et al.  Authority , 2010, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[32]  Aziz Sheikh,et al.  Trends in the epidemiology of smoking recorded in UK general practice. , 2010, The British journal of general practice : the journal of the Royal College of General Practitioners.

[33]  Tim Coleman,et al.  Impact of contractual financial incentives on the ascertainment and management of smoking in primary care. , 2007, Addiction.

[34]  Robert West,et al.  Outcome criteria in smoking cessation trials: proposal for a common standard. , 2005, Addiction.

[35]  K. Thiru,et al.  Systematic review of scope and quality of electronic patient record data in primary care , 2003, BMJ : British Medical Journal.

[36]  K Ho,et al.  Pharmacological aids for smoking cessation. , 2001, Journal of the Massachusetts Dental Society.

[37]  J. Mant,et al.  The accuracy of general practitioner records of smoking and alcohol use: comparison with patient questionnaires. , 2000, Journal of public health medicine.