Validation of electronic health record phenotyping of bipolar disorder cases and controls.

OBJECTIVE The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. METHOD EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. RESULTS The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHR-classified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. CONCLUSIONS Semiautomated mining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.

[1]  J. Nurnberger,et al.  Diagnostic accuracy and confusability analyses: an application to the Diagnostic Interview for Genetic Studies , 1996, Psychological Medicine.

[2]  F. McMahon,et al.  Diagnostic reliability of bipolar II disorder. , 2002, Archives of general psychiatry.

[3]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[4]  Isaac S. Kohane,et al.  Architecture of the Open-source Clinical Research Chart from Informatics for Integrating Biology and the Bedside , 2007, AMIA.

[5]  M. McCarthy,et al.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes , 2008, Nature Genetics.

[6]  Esben Agerbo,et al.  Bipolar disorder, schizoaffective disorder, and schizophrenia overlap: a new comorbidity index. , 2009, The Journal of clinical psychiatry.

[7]  I. Kohane,et al.  Instrumenting the health care enterprise for discovery research in the genomic era. , 2009, Genome research.

[8]  M. Sordo,et al.  Rapid Identification of Myocardial Infarction Risk Associated With Diabetes Medications Using Electronic Medical Records , 2009, Diabetes Care.

[9]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[10]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[11]  D. Clair,et al.  Meta-analysis of genome-wide association data of bipolar disorder and major depressive disorder , 2011, Molecular Psychiatry.

[12]  R. Kotov,et al.  Diagnostic consistency of major depression with psychosis across 10 years. , 2011, The Journal of clinical psychiatry.

[13]  Hua Xu,et al.  Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin , 2011, J. Am. Medical Informatics Assoc..

[14]  R. Kotov,et al.  Diagnostic shifts during the decade following first admission for psychosis. , 2011, The American journal of psychiatry.

[15]  M. Fava,et al.  Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model , 2011, Psychological Medicine.

[16]  Manuel A. R. Ferreira,et al.  Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4 , 2011, Nature Genetics.

[17]  Peter Szolovits,et al.  Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. , 2011, American journal of human genetics.

[18]  I. Kohane Using electronic health records to drive discovery in disease genomics , 2011, Nature Reviews Genetics.

[19]  R. Altman,et al.  Detecting Drug Interactions From Adverse‐Event Reports: Interaction Between Paroxetine and Pravastatin Increases Blood Glucose Levels , 2011, Clinical pharmacology and therapeutics.

[20]  Anders D. Børglum,et al.  Genome-wide association study identifies five new schizophrenia loci , 2011, Nature Genetics.

[21]  M. Fava,et al.  Antidepressant response in patients with major depression exposed to NSAIDs: a pharmacovigilance study. , 2012, The American journal of psychiatry.

[22]  Isaac S. Kohane,et al.  A translational engine at the national scale: informatics for integrating biology and the bedside , 2012, J. Am. Medical Informatics Assoc..

[23]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[24]  Isaac S. Kohane,et al.  Technical desiderata for the integration of genomic data into Electronic Health Records , 2012, J. Biomed. Informatics.

[25]  M. Fava,et al.  Incident user cohort study of risk for gastrointestinal bleed and stroke in individuals with major depressive disorder treated with antidepressants , 2012, BMJ Open.

[26]  D. Roden,et al.  Predicting Clopidogrel Response Using DNA Samples Linked to an Electronic Health Record , 2012, Clinical pharmacology and therapeutics.

[27]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.

[28]  Qing Zeng-Treitler,et al.  Feasibility of studying brain morphology in major depressive disorder with structural magnetic resonance imaging and clinical data from the electronic medical record: A pilot study , 2013, Psychiatry Research: Neuroimaging.

[29]  Emily A. Kuhl,et al.  DSM-5 field trials in the United States and Canada, Part II: test-retest reliability of selected categorical diagnoses. , 2013, The American journal of psychiatry.

[30]  Peter Szolovits,et al.  Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls. , 2013, Arthritis and rheumatism.

[31]  Christopher G. Chute,et al.  A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects , 2013, Human Genetics.

[32]  Qing Zeng-Treitler,et al.  Limbic system white matter microstructure and long-term treatment outcome in major depressive disorder: A diffusion tensor imaging study using legacy data , 2014, The world journal of biological psychiatry : the official journal of the World Federation of Societies of Biological Psychiatry.