Estimating County Health Indices Using Graph Neural Networks

Population health analytics is fundamental to developing responsive public health promotion programs. A traditional method to interpret health statistics at population level is analyzing data aggregated from individuals, typically through telephone surveys. Recent studies have found that social media can be utilized as an alternative population health surveillance system, providing quality and timely data at virtually no cost. In this paper, we further investigate the use of social media to the task of population health estimation, based on a graph neural network approach. Specifically, we first introduce a graph modeling method to construct the representation of each county as a graph of interactions between health-related features in the community. We then adopt a graph neural network model to learn the population health representation, ended by a regression layer, to estimate the health indices. We validate our proposed method by large-scale experiments on Twitter data for the task of predicting health indices of the US counties. Empirical results show a significant correlation with the reported health statistics, up to a Spearman correlation coefficient (\(\rho \)) value of 0.69, and that our graph-based approach outperforms the existing methods. These promising results also suggest potential application of graph-based models to a range of societal-level analytics tasks through social media.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  J. Pennebaker,et al.  Confronting a traumatic event: toward an understanding of inhibition and disease. , 1986, Journal of abnormal psychology.

[3]  Eric Horvitz,et al.  Social media as a measurement tool of depression in populations , 2013, WebSci.

[4]  Mike Conway,et al.  Feature Studies to Inform the Classification of Depressive Symptoms from Twitter Data for Population Health , 2017, ArXiv.

[5]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[6]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[7]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[8]  Scott Shenker,et al.  Fast and Interactive Analytics over Hadoop Data with Spark , 2012, login Usenix Mag..

[9]  Louis A. Gottschalk,et al.  The Measurement of Psychological States Through the Content Analysis of Verbal Behavior , 2023 .

[10]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[11]  Megha Agrawal,et al.  Characterizing Geographic Variation in Well-Being Using Tweets , 2013, ICWSM.

[12]  Aron Culotta,et al.  Estimating county health statistics with twitter , 2014, CHI.

[13]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[14]  Nazanin Andalibi,et al.  Depression-related Imagery on Instagram , 2015, CSCW Companion.

[15]  Mikhail Belkin,et al.  Towards a Theoretical Foundation for Laplacian-Based Manifold Methods , 2005, COLT.

[16]  Christopher M. Danforth,et al.  Instagram photos reveal predictive markers of depression , 2016, EPJ Data Science.

[17]  John Yearwood,et al.  Kernel-based features for predicting population health indices from geocoded social media data , 2017, Decis. Support Syst..

[18]  Munmun De Choudhury,et al.  A Social Media Based Index of Mental Well-Being in College Campuses , 2017, CHI.

[19]  Jorge-Arnulfo Quiané-Ruiz,et al.  Efficient Big Data Processing in Hadoop MapReduce , 2012, Proc. VLDB Endow..

[20]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[21]  John Yearwood,et al.  Using spatiotemporal distribution of geocoded Twitter data to predict US county-level health indices , 2020, Future Gener. Comput. Syst..

[22]  John Yearwood,et al.  Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features , 2017, WWW.

[23]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[24]  M. K. Chen The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets , 2012, The American economic review.

[25]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[26]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[27]  Donald F. Towsley,et al.  Diffusion-Convolutional Neural Networks , 2015, NIPS.