Similarity-based health risk prediction using Domain Fusion and electronic health records data

Electronic Health Record (EHR) data represents a valuable resource for individualized prospective prediction of health conditions. Statistical methods have been developed to measure patient similarity using EHR data, mostly using clinical attributes. Only a handful of recent methods have combined clinical analytics with other forms of similarity analytics, and no unified framework exists yet to measure comprehensive patient similarity. Here, we developed a generic framework named Patient similarity based on Domain Fusion (PsDF). PsDF performs patient similarity assessment on each available domain data separately, and then integrate the affinity information over various domains into a comprehensive similarity metric. We used the integrated patient similarity to support outcome prediction by assigning a risk score to each patient. With extensive simulations, we demonstrated that PsDF outperformed existing risk prediction methods including a random forest classifier, a regression-based model, and a naïve similarity method, especially when heterogeneous signals exist across different domains. Using PsDF and EHR data extracted from the data warehouse of Columbia University Irving Medical Center, we developed two different clinical prediction tools for two different clinical outcomes: incident cases of end stage kidney disease (ESKD) and severe aortic stenosis (AS) requiring valve replacement. We demonstrated that our new prediction method is scalable to large datasets, robust to random missingness, and generalizable to diverse clinical outcomes.

[1]  Bo Wang,et al.  Unsupervised metric fusion by cross diffusion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Gary King,et al.  Logistic Regression in Rare Events Data , 2001, Political Analysis.

[3]  Fei Wang,et al.  Supervised patient similarity measure of heterogeneous patient records , 2012, SKDD.

[4]  Anita Burgun-Parenthoine,et al.  Phenotypic similarity for rare disease: Ciliopathy diagnoses and subtyping , 2019, J. Biomed. Informatics.

[5]  Søren Brunak,et al.  Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts , 2011, PLoS Comput. Biol..

[6]  Charles E. McCulloch,et al.  CHRONIC KIDNEY DISEASE AND THE RISKS OF DEATH, CARDIOVASCULAR EVENTS, AND HOSPITALIZATION , 2004 .

[7]  Darcy A. Davis,et al.  Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework , 2013, Journal of General Internal Medicine.

[8]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[9]  Jiaquan Xu,et al.  Deaths: Final Data for 2013. , 2016, National vital statistics reports : from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System.

[10]  Shuang Wang,et al.  Using association signal annotations to boost similarity network fusion , 2019, Bioinform..

[11]  S. Bakken,et al.  Disease Heritability Inferred from Familial Relationships Reported in Medical Records , 2018, Cell.

[12]  C. Kent The Effect of Social Media in Social Interaction , 2019 .

[13]  Benjamin S. Glicksberg,et al.  Identification of type 2 diabetes subgroups through topological analysis of patient similarity , 2015, Science Translational Medicine.

[14]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[15]  B. Wells,et al.  Strategies for Handling Missing Data in Electronic Health Record Derived Data , 2013, EGEMS.

[16]  Zhen Hu,et al.  Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record , 2017, J. Biomed. Informatics.

[17]  Fei Wang,et al.  Composite distance metric integration by leveraging multiple experts' inputs and its application in patient similarity assessment , 2012, Stat. Anal. Data Min..

[18]  Jianying Hu,et al.  Towards Personalized Medicine: Leveraging Patient Similarity and Drug Similarity Analytics , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[19]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[20]  Tong Li,et al.  Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: A cross-sectional, unselected, retrospective study , 2015, J. Biomed. Informatics.

[21]  H. Morgenstern,et al.  State-Level Awareness of Chronic Kidney Disease in the U.S. , 2017, American journal of preventive medicine.