Infodemiology and Infoveillance: Innovative Methods and Tools to Measure, Track, and Analyze Population Health-Relevant Unstructured Data from the Internet and Social Media

Infodemiology can be defined as the science of distribution and determinants of information in an electronic medium, specifically the Internet, or in a population, with the ultimate aim to inform public health and public policy. “Infoveillance” is the longitudinal tracking of infodemiology metrics for surveillance and trend analysis. With “information” we mean unstructured, textual, openly accessible information produced and consumed by the public on the Internet. Our preliminary research – primarily done on the context of seasonal influenza and the H1N1 pandemic – suggests that collecting, mining, and continuously analyzing textual data from various open and proprietary Internet sources has significant potential to inform public health and public policy. Infodemiology data can be collected and analyzed in near real time, and various indexes and indicators can be constructed, which show – in analogy stock indices – trends in real time sentiment, public opinion, public health relevant behavior, and knowledge. It can also measure inequities and disparities in the availability of health information. Examples for infodemiology applications include: detecting and quantifying disparities in health information availability; the analysis of queries from Internet search engines to predict disease outbreaks (eg. influenza); monitoring peoples' status updates on microblogs such as Twitter for syndromic surveillance; identifying and monitoring of public health relevant publications on the Internet (eg. anti-vaccination sites, but also news articles or expert-curated outbreak reports); automated tools to measure information diffusion and knowledge translation, and tracking the effectiveness of health marketing campaigns. Moreover, analyzing how people search and navigate the Internet for health-related information, as well as how they communicate and share this information, can provide valuable insights into health-related behavior of populations. In this talk we will present an open source toolkit (Infovigil) to monitor, track, archive, and visualize health information seeking and information provision patterns on the Internet. We will illustrate the potential of this approach by presenting data from our H1N1 data-mining exercise, where we archived all tweets containing the keywords H1N1 or "swine flu" or "swineflu" sent during the H1N1 pandemic (over 2 million between May and December 2009). Among other sub-projects, we analyzed vaccination sentiment over time, identified frequently tweeted news articles, analyzed the social media strategies of public health agencies and hospitals, and evaluated the impact of individual and organizational twitter users (as measured by re-tweets and other metrics). The Infovigil platform is a tool allowing researchers and public health officials to set up analysis and tracking projects, and for creating dashboards for “all hazards” epidemic intelligence, and we are looking for partners and funders to refine this vision. []