Forecasting Rare Disease Outbreaks with Spatio-temporal Topic Models

Rapidly increasing volumes of news, tweets, and blogs are proving to be extremely valuable resources in helping anticipate, detect, and forecast significant societal events. In this paper, we focus on the problem of forecasting rare disease outbreaks and demonstrate how spatio-temporal topic models over health-related newspaper articles can successfully be used to forecast outbreaks. More precisely, we present a novel framework that integrates topic models with one-class SVMs, so that modeling the underlying topic evolution and forecasting its prominence can be used as a surrogate for making near-term predictions of disease outbreaks. We demonstrate the effectiveness of our proposed technique using incidence data for Hantavirus in multiple countries of Latin America.