Challenges for Signal Generation from Medical Social Media Data

Early detection of disease outbreaks is crucial for public health officials to react and report in time. Currently, novel approaches and sources of information are investigated to address this challenge. For example, data sources such as blogs or Twitter messages become increasingly important for epidemiologic surveillance. In traditional surveillance, statistical methods are used to interpret reported number of cases or other indicators to potential disease outbreaks. For analyzing data collected from other data sources, in particular for data extracted from unstructured text, it is still unclear whether these methods can be applied. This paper surveys existing methods for interpreting data for signal generation in public health. In particular, problems to be addressed when applying them to social media data will be summarized and future steps will be highlighted.