Crowdbreaks: Tracking Health Trends Using Public Social Media Data and Crowdsourcing

In the past decade, tracking health trends using social media data has shown great promise, due to a powerful combination of massive adoption of social media around the world, and increasingly potent hardware and software that enables us to work with these new big data streams. At the same time, many challenging problems have been identified. First, there is often a mismatch between how rapidly online data can change, and how rapidly algorithms are updated, which means that there is limited reusability for algorithms trained on past data as their performance decreases over time. Second, much of the work is focusing on specific issues during a specific past period in time, even though public health institutions would need flexible tools to assess multiple evolving situations in real time. Third, most tools providing such capabilities are proprietary systems with little algorithmic or data transparency, and thus little buy-in from the global public health and research community. Here, we introduce Crowdbreaks, an open platform which allows tracking of health trends by making use of continuous crowdsourced labeling of public social media content. The system is built in a way which automatizes the typical workflow from data collection, filtering, labeling and training of machine learning classifiers and therefore can greatly accelerate the research process in the public health domain. This work describes the technical aspects of the platform, thereby covering the functionalities at its current state and exploring its future use cases and extensions.

[1]  Anthony N. Nguyen,et al.  Clinical information extraction using small data: An active learning approach based on sequence representations and word embeddings , 2017, J. Assoc. Inf. Sci. Technol..

[2]  Li Fei-Fei,et al.  Crowdsourcing in Computer Vision , 2016, Found. Trends Comput. Graph. Vis..

[3]  David De Roure,et al.  Zooniverse: observing the world's largest citizen science platform , 2014, WWW.

[4]  Thomas Hofmann,et al.  Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification , 2017, WWW.

[5]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[6]  John S. Brownstein,et al.  Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time , 2014, PLoS Comput. Biol..

[7]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[8]  Chien-Ju Ho,et al.  Online Task Assignment in Crowdsourcing Markets , 2012, AAAI.

[9]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[10]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[11]  Simone Bianco,et al.  Quantius: Generic, high-fidelity human annotation of scientific images at 105-clicks-per-hour , 2017, bioRxiv.

[12]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[13]  C. Bauch,et al.  Critical dynamics in population vaccinating behavior , 2017, Proceedings of the National Academy of Sciences.

[14]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[15]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[16]  Maryam Yahya,et al.  Polio vaccines—“no thank you!” barriers to polio eradication in Northern Nigeria , 2007 .

[17]  Ophir Frieder,et al.  A framework for detecting public health trends with Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[18]  H. Larson,et al.  Lessons from polio eradication , 2011, Nature.

[19]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[20]  Marcel Salathé,et al.  Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control , 2011, PLoS Comput. Biol..

[21]  Neil Seeman,et al.  Assessing and responding in real time to online anti-vaccine sentiment during a flu pandemic. , 2010, Healthcare quarterly.

[22]  Thomas Wiegand,et al.  Focus Group on Artificial Intelligence for Health , 2018, ArXiv.

[23]  Ye Zhang,et al.  Active Discriminative Word Embedding Learning , 2016, ArXiv.

[24]  John S Brownstein,et al.  Measuring vaccine confidence: analysis of data obtained by a media surveillance system used to analyse public concerns about vaccines. , 2013, The Lancet. Infectious diseases.

[25]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[26]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[27]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[28]  Z. Popovic,et al.  Crystal structure of a monomeric retroviral protease solved by protein folding game players , 2011, Nature Structural &Molecular Biology.

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  John S Brownstein,et al.  Publicly Available Online Tool Facilitates Real-Time Monitoring Of Vaccine Conversations And Sentiments. , 2016, Health affairs.

[31]  Brian L. Sullivan,et al.  eBird: Engaging Birders in Science and Conservation , 2011, PLoS biology.