Discovering Disease Associations by Integrating Electronic Clinical Data and Medical Literature

Electronic health record (EHR) systems offer an exceptional opportunity for studying many diseases and their associated medical conditions within a population. The increasing number of clinical record entries that have become available electronically provides access to rich, large sets of patients' longitudinal medical information. By integrating and comparing relations found in the EHRs with those already reported in the literature, we are able to verify existing and to identify rare or novel associations. Of particular interest is the identification of rare disease co-morbidities, where the small numbers of diagnosed patients make robust statistical analysis difficult. Here, we introduce ADAMS, an Application for Discovering Disease Associations using Multiple Sources, which contains various statistical and language processing operations. We apply ADAMS to the New York-Presbyterian Hospital's EHR to combine the information from the relational diagnosis tables and textual discharge summaries with those from PubMed and Wikipedia in order to investigate the co-morbidities of the rare diseases Kaposi sarcoma, toxoplasmosis, and Kawasaki disease. In addition to finding well-known characteristics of diseases, ADAMS can identify rare or previously unreported associations. In particular, we report a statistically significant association between Kawasaki disease and diagnosis of autistic disorder.

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  K. Kupka,et al.  International classification of diseases: ninth revision. , 1978, WHO chronicle.

[3]  D. Morens,et al.  National surveillance of Kawasaki disease. , 1980, Pediatrics.

[4]  E. Rubinstein,et al.  Kaposi's sarcoma in immunosuppression possibly the result of a dual viral infection , 1990, Cancer.

[5]  L. Buonaguro,et al.  Molecular mechanisms in the pathogenesis of AIDS-associated Kaposi's sarcoma. , 1991, Advances in experimental medicine and biology.

[6]  R. Gallo,et al.  Pathogenesis of AIDS-associated Kaposi's sarcoma. , 1991, Hematology/oncology clinics of North America.

[7]  John W. Ward,et al.  1993 revised classification system for HIV infection and expanded surveillance case definition for AIDS among adolescents and adults. , 1993, MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports.

[8]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[9]  L. Jonides,et al.  Kawasaki disease. , 1994, Journal of pediatric health care : official publication of National Association of Pediatric Nurse Associates & Practitioners.

[10]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[11]  É. Oksenhendler,et al.  Toxoplasma gondii infection in advanced HIV infection , 1994, AIDS.

[12]  T. Kawasaki General review and problems in Kawasaki disease. , 1995, Japanese heart journal.

[13]  L. Goldstein Accuracy of ICD-9-CM coding for the identification of patients with acute ischemic stroke: effect of modifier codes. , 1998, Stroke.

[14]  L I Iezzoni,et al.  Does clinical evidence support ICD-9-CM diagnosis coding of complications? , 2000, Medical care.

[15]  A. Mahdhaoui,et al.  Kawasaki disease with predominant central nervous system involvement. , 2001, Pediatric neurology.

[16]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[17]  C. Weyand,et al.  Medium- and large-vessel vasculitis. , 2003, The New England journal of medicine.

[18]  K. Holmes,et al.  Treating opportunistic infections among HIV-infected adults and adolescents: recommendations from CDC, the National Institutes of Health, and the HIV Medicine Association/Infectious Diseases Society of America. , 2004, MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports.

[19]  Michael Krauthammer,et al.  GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data , 2004, J. Biomed. Informatics.

[20]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[21]  Jane Hendy,et al.  Challenges to implementing the national programme for information technology (NPfIT): a qualitative study , 2005, BMJ : British Medical Journal.

[22]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[23]  Kaposi,et al.  Idiopathisches multiples Pigmentsarkom der Haut , 1872, Archiv für Dermatologie und Syphilis.

[24]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[25]  A. Levine,et al.  Weekly docetaxel is safe and effective in the treatment of advanced‐stage acquired immunodeficiency syndrome‐related Kaposi sarcoma , 2005, Cancer.

[26]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[27]  Max Mühlhäuser,et al.  Analyzing and accessing Wikipedia as a lexical semantic resource , 2007 .

[28]  Don E Detmer,et al.  Research challenges for electronic health records. , 2007, American journal of preventive medicine.

[29]  Ncbi National Center for Biotechnology Information , 2008 .

[30]  Carol Friedman,et al.  PhenoGO: an integrated resource for the multiscale mining of clinical and biological data , 2009, BMC Bioinformatics.

[31]  M. Braun,et al.  Kawasaki Disease After Vaccination: Reports to the Vaccine Adverse Event Reporting System 1990–2007 , 2009, The Pediatric infectious disease journal.

[32]  L. Mofenson,et al.  Guidelines for the Prevention and Treatment of Opportunistic Infections among HIV-exposed and HIV-infected children: recommendations from CDC, the National Institutes of Health, the HIV Medicine Association of the Infectious Diseases Society of America, the Pediatric Infectious Diseases Society, and , 2009, MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports.

[33]  Xiaoyan Wang,et al.  Characterizing environmental and phenotypic associations using information theory and electronic health records , 2009, BMC Bioinformatics.

[34]  Sanjay Jain,et al.  Lupus myocarditis: marked improvement in cardiac function after intravenous immunoglobulin therapy , 2010, Rheumatology International.

[35]  P. Mierzejewski,et al.  Age-dependent lower or higher levels of hair mercury in autistic children than in healthy controls. , 2010, Acta neurobiologiae experimentalis.

[36]  Lack of Association Between Measles-Mumps-Rubella Vaccination and Autism in Children , 2010 .

[37]  Chunhua Weng,et al.  Aligning Structured and Unstructured Medical Problems Using UMLS. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[38]  David K. Vawdrey,et al.  Under-documentation of chronic kidney disease in the electronic health record in outpatients , 2010, J. Am. Medical Informatics Assoc..

[39]  C. Fullerton,et al.  Posttraumatic stress disorder and traumatic stress: from bench to bedside, from war to disaster , 2010, Annals of the New York Academy of Sciences.

[40]  Dimitra I. Petrakaki,et al.  Implementation and adoption of nationwide electronic health records in secondary care in England: qualitative analysis of interim results from a prospective national evaluation , 2010, BMJ : British Medical Journal.

[41]  Jeremy Thorp Europe's E-health initiatives. , 2010, Journal of AHIMA.

[42]  D. Forbes,et al.  Preventing Post Traumatic Stress Disorder: Are Drugs the Answer? , 2010, The Australian and New Zealand journal of psychiatry.

[43]  Melody S Goodman,et al.  Hepatitis B Vaccination of Male Neonates and Autism Diagnosis, NHIS 1997–2002 , 2010, Journal of toxicology and environmental health. Part A.

[44]  R. Majewska,et al.  Lack of Association Between Measles-Mumps-Rubella Vaccination and Autism in Children: A Case-Control Study , 2010, The Pediatric infectious disease journal.

[45]  Christopher G. Chute,et al.  An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records , 2010, J. Biomed. Informatics.

[46]  R. Rabadán,et al.  Signs of the 2009 Influenza Pandemic in the New York-Presbyterian Hospital Electronic Health Records , 2010, PloS one.

[47]  E. D. L. Reyes Autism and immunizations: separating fact from fiction. , 2010 .

[48]  George Hripcsak,et al.  Accelerating the use of electronic health records in physician practices. , 2010, The New England journal of medicine.

[49]  R. Yeung Kawasaki disease: update on pathogenesis , 2010, Current opinion in rheumatology.

[50]  O. Okosieme,et al.  Adequacy of thyroid hormone replacement in a general population. , 2011, QJM : monthly journal of the Association of Physicians.

[51]  A. Elklit,et al.  Psychological adjustment one year after the diagnosis of breast cancer: a prototype study of delayed post-traumatic stress disorder. , 2011, The British journal of clinical psychology.

[52]  M. Beresford,et al.  Juvenile Idiopathic Arthritis , 2011, Paediatric drugs.

[53]  Janet B W Williams Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[54]  C. Wiener Harrison's principles of internal medicine , 2013 .