Visual Subpopulation Discovery and Validation in Cohort Study Data

Epidemiology aims at identifying subpopulations of cohort participants that share common characteristics (e.g. alcohol consumption) to explain risk factors of diseases in cohort study data. These data contain information about the participants' health status gathered from questionnaires, medical examinations, and image acquisition. Due to the growing volume and heterogeneity of epidemiological data, the discovery of meaningful subpopulations is challenging. Subspace clustering can be leveraged to find subpopulations in large and heterogeneous cohort study datasets. In our collaboration with epidemiologists, we realized their need for a tool to validate discovered subpopulations. For this purpose, identified subpopulations should be searched for independent cohorts to check whether the findings apply there as well. In this paper we describe our interactive Visual Analytics framework S-ADVIsED for SubpopulAtion Discovery and Validation In Epidemiological Data. S-ADVIsED enables epidemiologists to explore and validate findings derived from subspace clustering. We provide a coordinated multiple view system, which includes a summary view of all subpopulations, detail views, and statistical information. Users can assess the quality of subspace clusters by considering different criteria via visualization. Furthermore, intervals for variables involved in a subspace cluster can be adjusted. This extension was suggested by epidemiologists. We investigated the replication of a selected subpopulation with multiple variables in another population by considering different measurements. As a specific result, we observed that study participants exhibiting high liver fat accumulation deviate strongly from other subpopulations and from the total study population with respect to age, body mass index, thyroid volume and thyroid-stimulating hormone.

[1]  Josua Krause,et al.  Supporting Iterative Cohort Construction with Visual Temporal Queries , 2016, IEEE Transactions on Visualization and Computer Graphics.

[2]  Michael Brauer,et al.  Centre for Health Services and Policy Research, and , 2022 .

[3]  Myra Spiliopoulou,et al.  Subpopulation Discovery in Epidemiological Data with Subspace Clustering , 2014 .

[4]  Bernhard Preim,et al.  Combining Subgroup Discovery and Clustering to Identify Diverse Subpopulations in Cohort Study Data , 2017, 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS).

[5]  Myra Spiliopoulou,et al.  Can We Classify the Participants of a Longitudinal Epidemiological Study from Their Previous Evolution? , 2015, 2015 IEEE 28th International Symposium on Computer-Based Medical Systems.

[6]  Gary King,et al.  MatchIt: Nonparametric Preprocessing for Parametric Causal Inference , 2011 .

[7]  Bernhard Preim,et al.  Subpopulation Discovery and Validation in Epidemiological Data , 2017, EuroVA@EuroVis.

[8]  Myra Spiliopoulou,et al.  Identifying Relevant Features for a Multi-factorial Disorder with Constraint-Based Subspace Clustering , 2016, 2016 IEEE 29th International Symposium on Computer-Based Medical Systems (CBMS).

[9]  Kai Lawonn,et al.  Interactive Visual Analysis of Image-Centric Cohort Study Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[10]  Daniel A. Keim,et al.  ClustNails: Visual Analysis of Subspace Clusters , 2012 .

[11]  Daniel A. Keim,et al.  Visual Quality Assessment of Subspace Clusterings , 2016 .

[12]  Zhiyuan Zhang,et al.  Iterative cohort analysis and exploration , 2015, Inf. Vis..

[13]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[14]  Kai Lawonn,et al.  3D Regression Heat Map Analysis of Population Study Data , 2016, IEEE Transactions on Visualization and Computer Graphics.

[15]  Bernhard Preim,et al.  Visual Analytics of Missing Data in Epidemiological Cohort Studies , 2017, VCBM.

[16]  Bernhard Preim,et al.  Visual Analytics of Image-Centric Cohort Studies in Epidemiology , 2015, Visualization in Medicine and Life Sciences III.

[17]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[18]  Elke Achtert,et al.  Detection and Visualization of Subspace Cluster Hierarchies , 2007, DASFAA.

[19]  Marcus A. Magnor,et al.  Combining automated analysis and visualization techniques for effective exploration of high-dimensional data , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[20]  Daniel A. Keim,et al.  Visual analytics for concept exploration in subspaces of patient groups , 2016, Brain Informatics.

[21]  Ira Assent,et al.  VISA: visual subspace clustering analysis , 2007, SKDD.

[22]  Mark Woodward,et al.  Epidemiology: Study Design and Data Analysis , 1999 .

[23]  H. Völzke,et al.  Study of Health in Pomerania (SHIP) , 2012, Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz.

[24]  D. Wheeler A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996 – 2003 , 2007, International journal of health geographics.

[25]  Mary K Obenshain Application of Data Mining Techniques to Healthcare Data , 2004, Infection Control & Hospital Epidemiology.

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.