Clinical code set engineering for reusing EHR data for research: A review

INTRODUCTION The construction of reliable, reusable clinical code sets is essential when re-using Electronic Health Record (EHR) data for research. Yet code set definitions are rarely transparent and their sharing is almost non-existent. There is a lack of methodological standards for the management (construction, sharing, revision and reuse) of clinical code sets which needs to be addressed to ensure the reliability and credibility of studies which use code sets. OBJECTIVE To review methodological literature on the management of sets of clinical codes used in research on clinical databases and to provide a list of best practice recommendations for future studies and software tools. METHODS We performed an exhaustive search for methodological papers about clinical code set engineering for re-using EHR data in research. This was supplemented with papers identified by snowball sampling. In addition, a list of e-phenotyping systems was constructed by merging references from several systematic reviews on this topic, and the processes adopted by those systems for code set management was reviewed. RESULTS Thirty methodological papers were reviewed. Common approaches included: creating an initial list of synonyms for the condition of interest (n=20); making use of the hierarchical nature of coding terminologies during searching (n=23); reviewing sets with clinician input (n=20); and reusing and updating an existing code set (n=20). Several open source software tools (n=3) were discovered. DISCUSSION There is a need for software tools that enable users to easily and quickly create, revise, extend, review and share code sets and we provide a list of recommendations for their design and implementation. CONCLUSION Research re-using EHR data could be improved through the further development, more widespread use and routine reporting of the methods by which clinical codes were selected.

[1]  Ann John,et al.  Case-finding for common mental disorders of anxiety and depression in primary care: an external validation of routinely collected data , 2016, BMC Medical Informatics and Decision Making.

[2]  Olivier Bodenreider,et al.  Issues in Creating and Maintaining Value Sets for Clinical Quality Measures , 2012, AMIA.

[3]  S. Reilly,et al.  Modelling Conditions and Health Care Processes in Electronic Health Records: An Application to Severe Mental Illness with the Clinical Practice Research Datalink , 2016, PloS one.

[4]  D. Reeves,et al.  ClinicalCodes: An Online Clinical Codes Repository to Improve the Validity and Reproducibility of Research Using Electronic Medical Records , 2014, PloS one.

[5]  Richard Platt,et al.  The U.S. Food and Drug Administration's Mini‐Sentinel program: status and direction , 2012, Pharmacoepidemiology and drug safety.

[6]  D. Moher,et al.  Correspondence2010 Statement: updated guidelines for reporting parallel group randomised trials , 2010 .

[7]  Peter Davey,et al.  A checklist for retrospective database studies--report of the ISPOR Task Force on Retrospective Databases. , 2003, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[8]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[9]  Jeffrey M. Miller,et al.  Harvest: an open platform for developing web-based biomedical data discovery and reporting applications , 2013, J. Am. Medical Informatics Assoc..

[10]  Olivier Bodenreider,et al.  The NLM Value Set Authority Center , 2013, MedInfo.

[11]  Jie Xu,et al.  Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research , 2015, J. Am. Medical Informatics Assoc..

[12]  Patrick B. Ryan,et al.  Applying standardized drug terminologies to observational healthcare databases: a case study on opioid exposure , 2012, Health Services and Outcomes Research Methodology.

[13]  David Stables,et al.  QRESEARCH: a new general practice database for research. , 2004, Informatics in primary care.

[14]  G. Wade Implementing SNOMED CT for Quality Reporting: Avoiding Pitfalls , 2011, Applied Clinical Informatics.

[15]  Christopher G. Chute,et al.  A Standards-based Semantic Metadata Repository to Support EHR-driven Phenotype Authoring and Execution , 2015, MedInfo.

[16]  Irene Petersen,et al.  Creating medical and drug code lists to identify cases in primary care databases , 2009, Pharmacoepidemiology and drug safety.

[17]  Ronan A Lyons,et al.  The Health Informatics Trial Enhancement Project (HITE): Using routinely collected primary care data to identify potential participants for a depression trial , 2010, Trials.

[18]  Jodie A. Trafton,et al.  Identifying Neck and Back Pain in Administrative Data: Defining the Right Cohort , 2012, Spine.

[19]  David Moher,et al.  The REporting of Studies Conducted Using Observational Routinely-Collected Health Data (RECORD) Statement: Methods for Arriving at Consensus and Developing Reporting Guidelines , 2015, PloS one.

[20]  B. Motheral,et al.  The use of claims databases for outcomes research: rationale, challenges, and strategies. , 1997, Clinical therapeutics.

[21]  Sally Hopewell,et al.  The quality of reports of randomised trials in 2000 and 2006: comparative study of articles indexed in PubMed , 2010, BMJ : British Medical Journal.

[22]  C. Salisbury,et al.  Clinical workload in UK primary care: a retrospective analysis of 100 million consultations in England, 2007–14 , 2016, The Lancet.

[23]  Roy Pardee,et al.  The HMO Research Network Virtual Data Warehouse: A Public Data Model to Support Collaboration , 2014, EGEMS.

[24]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[25]  K. Bhaskaran,et al.  Data Resource Profile: Clinical Practice Research Datalink (CPRD) , 2015, International journal of epidemiology.

[26]  M. Gulliford,et al.  Coding, Recording and Incidence of Different Forms of Coronary Heart Disease in Primary Care , 2012, PloS one.

[27]  Jimeng Sun,et al.  Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods , 2016, Artif. Intell. Medicine.

[28]  Paul A. Harris,et al.  Desiderata for computable representations of electronic health records-driven phenotype algorithms , 2015, J. Am. Medical Informatics Assoc..

[29]  D. Moher,et al.  Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review , 2006, The Medical journal of Australia.

[30]  N. Adler,et al.  Using Electronic Health Records for Population Health Research: A Review of Methods and Applications. , 2016, Annual review of public health.

[31]  C. Sudlow,et al.  Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis , 2016, PloS one.

[32]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[33]  Steve Evans,et al.  Modular design, application architecture, and usage of a self-service model for enterprise data delivery: The Duke Enterprise Data Unified Content Explorer (DEDUCE) , 2014, J. Biomed. Informatics.

[34]  Krishnan Bhaskaran,et al.  The identification of incident cancers in UK primary care databases: a systematic review , 2015, Pharmacoepidemiology and drug safety.

[35]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[36]  I Buchan,et al.  Combining Health Data Uses to Ignite Health System Learning , 2015, Methods of Information in Medicine.

[37]  S. Hider,et al.  An algorithm to identify rheumatoid arthritis in primary care: a Clinical Practice Research Datalink study , 2015, BMJ Open.

[38]  Rob Koeling,et al.  What does validation of cases in electronic record databases mean? The potential contribution of free text† , 2011, Pharmacoepidemiology and drug safety.

[39]  M. Gulliford,et al.  Selection of Medical Diagnostic Codes for Analysis of Electronic Patient Records. Application to Stroke in a Primary Care Database , 2009, PloS one.

[40]  H. Conrad Cunningham,et al.  Modular Design , 2018 .

[41]  J. Mathews,et al.  Cancer risk in 680 000 people exposed to computed tomography scans in childhood or adolescence: data linkage study of 11 million Australians , 2013, BMJ.

[42]  Alexander G. Hauptmann,et al.  Health Monitoring and Personalized Feedback using Multimedia Data , 2015, Springer International Publishing.

[44]  Olivier Bodenreider,et al.  Metrics for assessing the quality of value sets in clinical quality measures , 2013, AMIA.

[45]  Christopher G. Chute,et al.  Quality evaluation of value sets from cancer study common data elements using the UMLS semantic groups , 2012, J. Am. Medical Informatics Assoc..

[46]  Christopher G. Chute,et al.  LexValueSets: An Approach for Context-Driven Value Sets Extraction , 2008, AMIA.

[47]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[48]  Jay Pedersen,et al.  Employing complex polyhierarchical ontologies and promoting interoperability of i2b2 data systems , 2015, AMIA.

[49]  Alan L. Rector,et al.  Binding Ontologies & Coding Systems to Electronic Health Records and Messages , 2006, KR-MED.

[50]  Natalia Beloff,et al.  Characterisation of Data Quality in Electronic Healthcare Records , 2015, Health Monitoring and Personalized Feedback using Multimedia Data.

[51]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[52]  Steve G Peters,et al.  Big Data and the Electronic Health Record , 2014, The Journal of ambulatory care management.

[53]  J. Shults,et al.  Validation of The Health Improvement Network (THIN) database for epidemiologic studies of chronic kidney disease , 2011, Pharmacoepidemiology and drug safety.

[54]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[55]  Richard Platt,et al.  Launching PCORnet, a national patient-centered clinical research network , 2014, Journal of the American Medical Informatics Association : JAMIA.

[56]  Greta Rait,et al.  Optimising Use of Electronic Health Records to Describe the Presentation of Rheumatoid Arthritis in Primary Care: A Strategy for Developing Code Lists , 2013, PloS one.

[57]  Natalia Beloff,et al.  Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface , 2013, J. Am. Medical Informatics Assoc..

[58]  H. Prokosch,et al.  Perspectives for Medical Informatics , 2009, Methods of Information in Medicine.

[59]  John P. A. Ioannidis,et al.  Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review , 2017, J. Am. Medical Informatics Assoc..

[60]  Eric W. Ford,et al.  Impact of the HITECH Act on physicians' adoption of electronic health records , 2016, J. Am. Medical Informatics Assoc..

[61]  Kent A. Spackman,et al.  Compositional concept representation using SNOMED: towards further convergence of clinical terminologies , 1998, AMIA.

[62]  Natalya F. Noy,et al.  BioPortal: Ontologies and Integrated Data Resources at the Click of a Mouse , 2009 .

[63]  Katherine I. Morley,et al.  Defining Disease Phenotypes Using National Linked Electronic Health Records: A Case Study of Atrial Fibrillation , 2014, PloS one.

[64]  S. de Lusignan,et al.  Call for consistent coding in diabetes mellitus using the Royal College of General Practitioners and NHS pragmatic classification of diabetes. , 2013, Informatics in primary care.

[65]  J. Chisholm,et al.  The Read clinical classification. , 1990, BMJ.

[66]  Joshua C Denny,et al.  Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals , 2017, J. Am. Medical Informatics Assoc..

[67]  J. Pell,et al.  Impact of smoke-free legislation on perinatal and infant mortality: a national quasi-experimental study , 2015, Scientific Reports.

[68]  L. Smeeth,et al.  Validation and validity of diagnoses in the General Practice Research Database: a systematic review , 2010, British journal of clinical pharmacology.

[69]  H. Guess,et al.  All-cause mortality and vascular events among patients with rheumatoid arthritis, osteoarthritis, or no arthritis in the UK General Practice Research Database. , 2003, The Journal of rheumatology.

[70]  Evangelos Kontopantelis,et al.  Primary Care Medication Safety Surveillance with Integrated Primary and Secondary Care Electronic Health Records: A Cross-Sectional Study , 2015, Drug Safety.

[71]  J. Mytton,et al.  The feasibility of using local general practice data to estimate the prevalence of childhood disabling conditions. , 2013, Child: care, health and development.

[72]  Nick Freemantle,et al.  Effect of the quality and outcomes framework on diabetes care in the United Kingdom: retrospective cohort study , 2009, BMJ : British Medical Journal.

[73]  D. Moher,et al.  CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials , 2010, Journal of pharmacology & pharmacotherapeutics.

[74]  Christopher G. Chute,et al.  Quality evaluation of cancer study Common Data Elements using the UMLS Semantic Network , 2011, J. Biomed. Informatics.

[75]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[76]  R Bache,et al.  Piloting the EHR4CR Feasibility Platform across Europe , 2014, Methods of Information in Medicine.

[77]  Dipak Kalra,et al.  Data Resource Profile: Cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER) , 2012, International journal of epidemiology.

[78]  L. Smeeth,et al.  How accurate are diagnoses for rheumatoid arthritis and juvenile idiopathic arthritis in the general practice research database? , 2008, Arthritis and rheumatism.

[79]  et al.,et al.  How is the electronic health record being used? Use of EHR data to assess physician-level variability in technology use , 2014, J. Am. Medical Informatics Assoc..

[80]  James J. Cimino,et al.  Desiderata for Healthcare Integrated Data Repositories Based on Architectural Comparison of Three Public Repositories , 2013, AMIA.

[81]  M. Wallander,et al.  Rheumatoid arthritis in UK primary care: incidence and prior morbidity , 2009, Scandinavian journal of rheumatology.

[82]  Jürgen Stausberg,et al.  ICD-10 codes used to identify adverse drug events in administrative data: a systematic review , 2013, J. Am. Medical Informatics Assoc..