On the Analysis of Diseases and Their Related Geographical Data

Electronic medical records (EMRs) store data related to patients information enrolled during their stay in health structures. Data stored into EMRs span from data crawled from biological laboratories to textual description of diseases and diagnostic device results (e.g., biomedical images). Each EMR is related to a diagnosis related group (DRG) record. A DRG record is a record associated with a citizen that has been cured in a hospital. It contains a code, called major diagnostic category (MDC), which summarizes the treated disease and allows to reimburse costs related to patient treatments during his staying in health structures. DRGs are used for administrative process (e.g., costs and reimbursement management) as well as disease monitoring. Associating diagnostic codes with external information (such as environmental and geographical data) and with information filtered from EMRs (e.g., biological results or analytes values) can be useful to monitor citizens wellness status. We propose a methodology to analyze such data based on a multistep process. First, we cross reference data by using a semantics-based clustering procedure, extract information from EMRs, and then, cluster them by looking for similar patterns of diseases. Then, biological records in each disease cluster are analyzed to evaluate intracluster similarity by selecting analytes typologies and values. Finally, biological data is related to diagnosis codes and geometrically projected in areas of interest in order to map calculated outlier patients. We applied the methodology on two case studies: 1) diagnosis codes and biochemical analytes of 20 000 biological analyses about hospitalized patients during one observation year and 2) the correlation between cardiovascular diseases and water quality in a southern Italian region. Preliminary findings show the effectiveness of our method.

[1]  Mario Cannataro,et al.  Semantic similarity analysis of protein data: assessment with biological features and issues , 2012, Briefings Bioinform..

[2]  Pierre R. Bushel,et al.  Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes , 2013, BMC Systems Biology.

[3]  Rajib Paul,et al.  Using GIS and Secondary Data to Target Diabetes-Related Public Health Efforts , 2013, Public health reports.

[4]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[5]  William M. Tierney,et al.  Using electronic medical records to predict mortality in primary care patients with heart disease , 2007, Journal of General Internal Medicine.

[6]  Sergio Greco,et al.  Studying neonatal TSH distribution by using GIS , 2012, HealthGIS '12.

[7]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[8]  Jaime E Hart,et al.  Spatial clustering of physical activity and obesity in relation to built environment factors among older women in three U.S. states , 2014, BMC Public Health.

[9]  Paul R. Hunter,et al.  Water Supply and Health , 2010, PLoS medicine.

[10]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[11]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[12]  D. Pepin,et al.  Geographic variation of the mortality from cardiovascular disease and drinking water in a French small area (Puy de Dome). , 2000, Environmental research.

[13]  Kaija Saranto,et al.  Definition, structure, content, use and impacts of electronic health records: A review of the research literature , 2008, Int. J. Medical Informatics.

[14]  Anne F. Kittler,et al.  A cost-benefit analysis of electronic medical records in primary care. , 2003, The American journal of medicine.

[15]  Cynthia Brandt,et al.  Semantic similarity in the biomedical domain: an evaluation across knowledge sources , 2012, BMC Bioinformatics.

[16]  Handan Wand,et al.  Geographical Clustering of High Risk Sexual Behaviors in “Hot-Spots” for HIV and Sexually Transmitted Infections in Kwazulu-Natal, South Africa , 2013, AIDS and Behavior.

[17]  Mehrdad Askarian,et al.  The Spatial Distribution of Cancer Incidence in Fars Province: A GIS-Based Analysis of Cancer Registry Data , 2013, International journal of preventive medicine.

[18]  Satish Kumar David,et al.  Comparative Analysis of Data Mining Tools and Classification Techniques using WEKA in Medical Bioinformatics , 2013 .

[19]  S. Scobie Spatial epidemiology: methods and applications , 2003 .

[20]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.

[21]  T. Carpenter,et al.  Spatial analytical methods and geographic information systems: use in health research and epidemiology. , 1999, Epidemiologic reviews.

[22]  Steven B Heymsfield,et al.  The geographic concentration of us adult obesity prevalence and associated social, economic, and environmental factors , 2014, Obesity.

[23]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[24]  J E Rogers,et al.  Quality Assurance of Medical Ontologies , 2006, Methods of Information in Medicine.

[25]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[26]  Ragnar Rylander,et al.  Mineral water intake reduces blood pressure among subjects with low urinary magnesium and calcium levels , 2004, BMC public health.

[27]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[28]  T Januszko,et al.  [Goiter incidence among children living in the villages in the Białystok district and the concentrations of iodine, calcium and magnesium in drinking water]. , 1981, Roczniki Panstwowego Zakladu Higieny.

[29]  R. Stafford,et al.  Electronic health records and clinical decision support systems: impact on national ambulatory care quality. , 2011, Archives of internal medicine.

[30]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[31]  Clarissa Rodrigues Garcia,et al.  Spatial distribution of ischemic heart disease mortality in Rio Grande do Sul, Brazil , 2013, HealthGIS '13.

[32]  J Wakefield,et al.  Magnesium in drinking water supplies and mortality from acute myocardial infarction in north west England , 1999, Heart.

[33]  Peter J. Haug,et al.  Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation , 2006, J. Biomed. Informatics.

[34]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[35]  Søren Brunak,et al.  Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts , 2011, PLoS Comput. Biol..

[36]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[37]  J Tuomilehto,et al.  Geochemistry of ground water and the incidence of acute myocardial infarction in Finland , 2004, Journal of epidemiology and community health.