Validation of Electronic Health Record Phenotyping of Bipolar Disorder and Controls

Objective— To validate the use of electronic health records (EHRs) for the diagnosis of bipolar disorder (BD) and controls. Methods— EHR data were obtained from a healthcare system of more than 4.2 million patients spanning more than 20 years. Chart review by experienced clinicians was used to identify text features and coded data consistent or inconsistent with a diagnosis of BD. Natural language processing (NLP) was used to train a diagnostic algorithm with 95% specificity for classifying BD. Filtered coded data were used to derive three additional classification rules for cases and one for controls. The positive predictive value (PPV) of EHR-based BD and subphenotype diagnoses was calculated against direct semi-structured interview diagnoses by trained clinicians blind to EHR diagnosis in a sample of 190 patients. Results— The PPV of NLP-defined BD was 0.85. A coded classification based on strict filtering achieved a PPV of 0.79, but BD classifications based on less stringent criteria performed less well. None of the EHR-classified controls was given a diagnosis of BD on direct interview (PPV = 1.0). For most subphenotypes, PPVs exceeded 0.80. The EHR-based classifications were used to accrue 4500 BD cases and 5000 controls for genetic analyses. Conclusions— Semi-automated mining of EHRs can be used to ascertain BD cases and controls with high specificity and predictive value compared to a gold-standard diagnostic interview. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.

[1]  Qing Zeng-Treitler,et al.  Limbic system white matter microstructure and long-term treatment outcome in major depressive disorder: A diffusion tensor imaging study using legacy data , 2014, The world journal of biological psychiatry : the official journal of the World Federation of Societies of Biological Psychiatry.

[2]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.

[3]  Qing Zeng-Treitler,et al.  Feasibility of studying brain morphology in major depressive disorder with structural magnetic resonance imaging and clinical data from the electronic medical record: A pilot study , 2013, Psychiatry Research: Neuroimaging.

[4]  Peter Szolovits,et al.  Autoantibodies, autoimmune risk alleles and clinical associations in rheumatoid arthritis cases and non-RA controls in the electronic medical records , 2013 .

[5]  Emily A. Kuhl,et al.  DSM-5 field trials in the United States and Canada, Part II: test-retest reliability of selected categorical diagnoses. , 2013, The American journal of psychiatry.

[6]  Christopher G. Chute,et al.  A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects , 2013, Human Genetics.

[7]  M. Fava,et al.  Antidepressant response in patients with major depression exposed to NSAIDs: a pharmacovigilance study. , 2012, The American journal of psychiatry.

[8]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[9]  Isaac S. Kohane,et al.  Technical desiderata for the integration of genomic data into Electronic Health Records , 2012, J. Biomed. Informatics.

[10]  M. Fava,et al.  Incident user cohort study of risk for gastrointestinal bleed and stroke in individuals with major depressive disorder treated with antidepressants , 2012, BMJ Open.

[11]  Isaac S. Kohane,et al.  A translational engine at the national scale: informatics for integrating biology and the bedside , 2012, J. Am. Medical Informatics Assoc..

[12]  D. Roden,et al.  Predicting Clopidogrel Response Using DNA Samples Linked to an Electronic Health Record , 2012, Clinical pharmacology and therapeutics.

[13]  Disorder Working Group Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4 , 2012, Nature Genetics.

[14]  R. Kotov,et al.  Diagnostic shifts during the decade following first admission for psychosis. , 2011, The American journal of psychiatry.

[15]  R. Kotov,et al.  Diagnostic consistency of major depression with psychosis across 10 years. , 2011, The Journal of clinical psychiatry.

[16]  Hua Xu,et al.  Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin , 2011, J. Am. Medical Informatics Assoc..

[17]  R. Altman,et al.  Detecting Drug Interactions From Adverse‐Event Reports: Interaction Between Paroxetine and Pravastatin Increases Blood Glucose Levels , 2011, Clinical pharmacology and therapeutics.

[18]  M. Fava,et al.  Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model , 2011, Psychological Medicine.

[19]  I. Kohane Using electronic health records to drive discovery in disease genomics , 2011, Nature Reviews Genetics.

[20]  Peter Szolovits,et al.  Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. , 2011, American journal of human genetics.

[21]  D. Clair,et al.  Meta-analysis of genome-wide association data of bipolar disorder and major depressive disorder , 2011, Molecular Psychiatry.

[22]  M. Sordo,et al.  Rapid Identification of Myocardial Infarction Risk Associated With Diabetes Medications Using Electronic Medical Records , 2009, Diabetes Care.

[23]  I. Kohane,et al.  Instrumenting the health care enterprise for discovery research in the genomic era. , 2009, Genome research.

[24]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[25]  Esben Agerbo,et al.  Bipolar disorder, schizoaffective disorder, and schizophrenia overlap: a new comorbidity index. , 2009, The Journal of clinical psychiatry.

[26]  M. McCarthy,et al.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes , 2008, Nature Genetics.

[27]  Isaac S. Kohane,et al.  Architecture of the Open-source Clinical Research Chart from Informatics for Integrating Biology and the Bedside , 2007, AMIA.

[28]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[29]  F. McMahon,et al.  Diagnostic reliability of bipolar II disorder. , 2002, Archives of general psychiatry.

[30]  J. Nurnberger,et al.  Diagnostic accuracy and confusability analyses: an application to the Diagnostic Interview for Genetic Studies , 1996, Psychological Medicine.