Computer-assisted expert case definition in electronic health records

PURPOSE To describe how computer-assisted presentation of case data can lead experts to infer machine-implementable rules for case definition in electronic health records. As an illustration the technique has been applied to obtain a definition of acute liver dysfunction (ALD) in persons with inflammatory bowel disease (IBD). METHODS The technique consists of repeatedly sampling new batches of case candidates from an enriched pool of persons meeting presumed minimal inclusion criteria, classifying the candidates by a machine-implementable candidate rule and by a human expert, and then updating the rule so that it captures new distinctions introduced by the expert. Iteration continues until an update results in an acceptably small number of changes to form a final case definition. RESULTS The technique was applied to structured data and terms derived by natural language processing from text records in 29,336 adults with IBD. Over three rounds the technique led to rules with increasing predictive value, as the experts identified exceptions, and increasing sensitivity, as the experts identified missing inclusion criteria. In the final rule inclusion and exclusion terms were often keyed to an ALD onset date. When compared against clinical review in an independent test round, the derived final case definition had a sensitivity of 92% and a positive predictive value of 79%. CONCLUSION An iterative technique of machine-supported expert review can yield a case definition that accommodates available data, incorporates pre-existing medical knowledge, is transparent and is open to continuous improvement. The expert updates to rules may be informative in themselves. In this limited setting, the final case definition for ALD performed better than previous, published attempts using expert definitions.

[1]  Sengwee Toh,et al.  Validity of diagnostic codes to identify cases of severe acute liver injury in the U.S. Food and Drug Administration's Mini‐Sentinel Distributed Database , 2013, Pharmacoepidemiology and drug safety.

[2]  A. Carr,et al.  Primary total hip replacement surgery: a systematic review of outcomes and modelling of cost-effectiveness associated with different prostheses. , 1998, Health technology assessment.

[3]  E F Cook,et al.  Classification trees and logistic regression applied to prognostic studies: a comparison using meningococcal disease as an example. , 1999, Journal of tropical pediatrics.

[4]  Pernille Warrer,et al.  Using text-mining techniques in electronic patient records to identify ADRs from medicine use. , 2012, British Journal of Clinical Pharmacology.

[5]  T. Kurth,et al.  Health care resource utilization in patients with active epilepsy , 2010, Epilepsia.

[6]  Joshua C. Hollingsworth,et al.  Expert panel assessment of acute liver injury identification in observational data. , 2014, Research in social & administrative pharmacy : RSAP.

[7]  I. Kohane,et al.  Improving Case Definition of Crohn's Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing: A Novel Informatics Approach , 2013, Inflammatory bowel diseases.

[8]  Scott R. Halgrim,et al.  Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. , 2014, American journal of epidemiology.

[9]  N. Dreyer,et al.  Use of insurance claims in epidemiologic research: Identification of peptic ulcers, gi bleeding, pancreatitis, hepatitis and renal disease , 1995 .

[10]  Alexander M Walker,et al.  Identification of esophageal cancer in the General Practice Research Database , 2011, Pharmacoepidemiology and drug safety.

[11]  A. M. Walker Pattern recognition in health insurance claims databases , 2001, Pharmacoepidemiology and drug safety.

[12]  D. Spiegelhalter,et al.  Consensus development methods, and their use in clinical guideline development. , 1998, Health technology assessment.

[13]  David W. Bates,et al.  Research Paper: Using Computerized Data to Identify Adverse Drug Events in Outpatients , 2001, J. Am. Medical Informatics Assoc..

[14]  Steven H. Brown,et al.  Automated identification of postoperative complications within an electronic medical record using natural language processing. , 2011, JAMA.

[15]  Patrick B. Ryan,et al.  Alternative Outcome Definitions and Their Effect on the Performance of Methods for Observational Outcome Studies , 2013, Drug Safety.

[16]  J. Mate,et al.  Liver injury in inflammatory bowel disease: Long‐term follow‐up study of 786 patients , 2007, Inflammatory bowel diseases.

[17]  A M Walker,et al.  Prediction and cross-validation of neural networks versus logistic regression: using hepatic disorders as an example. , 1998, American journal of epidemiology.

[18]  A. Bate,et al.  Ascertainment of acute liver injury in two European primary care databases , 2014, European Journal of Clinical Pharmacology.

[19]  A M Walker,et al.  Epidemiologic interpretation of artificial neural networks. , 1998, American journal of epidemiology.

[20]  Alexander M Walker,et al.  Algorithms to identify colonic ischemia, complications of constipation and irritable bowel syndrome in medical claims data: development and validation , 2006, Pharmacoepidemiology and drug safety.

[21]  K. Lindor,et al.  Abnormal Hepatic Biochemistries in Patients With Inflammatory Bowel Disease , 2007, The American Journal of Gastroenterology.