Context Prediction in the Social Web Using Applied Machine Learning: A Study of Canadian Tweeters

In this ongoing work, we present the Grebe social data aggregation framework for extracting geo-fenced Twitter data for analysis of user engagement in health and wellness topics. Grebe also provides various visualization tools for analyzing temporal and geographical health trends. Grebe currently has over 18 million indexed public tweets, and is the first of its kind for Canadian researchers. The large dataset is used for analyzing three types of contexts: geographical context via prediction of user location using supervised learning, topical context via determining health-related tweets using various learning approaches, and affective context via sentiment analysis of tweets using rule-based methods. For the first, we define user location as the position from which users are posting a tweet and use standard precision metrics for evaluation with promising results for predicting provinces and cities from tweet text. For the second, we use a broader definition of health using the six dimensions of wellness model and evaluate using manually annotated documents with good results using supervised and semi-supervised machine learning. For the third, we use the indexed tweets to show current trends in emotions and opinions and demonstrate trends in polarity and emotions across various Canadian provinces. The combination of these contexts provides useful insights for digital epidemiology. Ultimately, the vision of Grebe is to provide researchers with Canada-specific social web datasets through an open source platform with an accessible RESTful API, and this paper showcases Grebe's potential and presents our progress towards achieving these goals.

[1]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[2]  Eleftherios Mylonakis,et al.  Google trends: a web-based tool for real-time surveillance of disease outbreaks. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[3]  Elizabeth P Howard,et al.  The Six Dimensions of Wellness and Cognition in Aging Adults , 2012, Journal of holistic nursing : official journal of the American Holistic Nurses' Association.

[4]  M. Salathé Digital epidemiology: what is it, and where is it going? , 2018, Life Sciences, Society and Policy.

[5]  Osmar R. Zaïane,et al.  Current State of Text Sentiment Analysis from Opinion to Emotion Mining , 2017, ACM Comput. Surv..

[6]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[7]  David P. Meurer,et al.  An Early Warning Influenza Model using Alberta Real- Time Syndromic Data (ARTSSN) , 2015, Online Journal of Public Health Informatics.

[8]  Nicolette de Keizer,et al.  Forty years of SNOMED: a literature review , 2008, BMC Medical Informatics Decis. Mak..

[9]  Walid Magdy,et al.  A Tool for Monitoring and Analyzing HealthCare Tweets , 2013 .

[10]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[11]  Henry A. Kautz,et al.  Towards Understanding Global Spread of Disease from Everyday Interpersonal Interactions , 2013, IJCAI.

[12]  Peter Martinsson,et al.  How Much is Too Much? , 2008 .

[13]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[14]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[15]  Gondy Leroy,et al.  Research Paper: Consumer Health Concepts That Do Not Map to the UMLS: Where Do They Fit? , 2008, J. Am. Medical Informatics Assoc..

[16]  R. Saito,et al.  Japanese Surveillance Systems and Treatment for Influenza , 2016, Current Treatment Options in Infectious Diseases.

[17]  James T. Schlitt,et al.  ChatterGrabber: A Lightweight Easy to Use Social Media Surveillance Toolkit , 2015, Online Journal of Public Health Informatics.

[18]  Luke S Sloan,et al.  Who Tweets with Their Location? Understanding the Relationship between Demographic Characteristics and the Use of Geoservices and Geotagging on Twitter , 2015, PloS one.

[19]  D. Rosenbloom,et al.  How much is too much? , 2007, MedGenMed : Medscape general medicine.

[20]  R. Plutchik Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice , 2016 .

[21]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..