Text Mining and Real-Time Analytics of Twitter Data: A Case Study of Australian Hay Fever Prediction

Social media platforms such as Twitter contain wealth of user-generated data and over time has become a virtual treasure trove of information for knowledge discovery with applications in healthcare, politics, social initiatives, to name a few. Despite the evident benefits of tweets exploration, there are numerous challenges associated with processing such data, given tweets specific characteristics. The study provides a brief of steps involved in manipulation Twitter data as well as offers the examples of the machine learning algorithms most commonly used in text analysis. It concludes with the case study on the Australian hay fever prediction with the application of the selected techniques described in the brief. It demonstrates an example of Twitter real-time analytics for heath condition surveillance with the use of interactive visualisations to assist knowledge discovery and findings dissemination. The results prove the potential of social media to play an important role in meaningful results extraction and guidance for decision makers.

[1]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[2]  Yanchun Zhang,et al.  Neural Sparse Topical Coding , 2018, ACL.

[3]  F. Johnston,et al.  Seasonal asthma in Melbourne, Australia, and some observations on the occurrence of thunderstorm asthma and its predictability , 2018, PloS one.

[4]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[5]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[6]  George Forman,et al.  Extremely fast text feature extraction for classification and indexing , 2008, CIKM '08.

[7]  Fasheng Liu,et al.  Survey on text clustering algorithm -Research present situation of text clustering algorithm , 2011, ICSE 2011.

[8]  Stefan Stieglitz,et al.  Towards more systematic Twitter analysis: metrics for tweeting activities , 2013 .

[9]  Michael J. Paul,et al.  Discovering Health Topics in Social Media Using Topic Models , 2014, PloS one.

[10]  Ke Wang,et al.  Feature Extraction from Micro-blogs for Comparison of Products and Services , 2013, WISE.

[11]  David A Asch,et al.  Decoding twitter: Surveillance and trends for cardiac arrest and resuscitation communication. , 2013, Resuscitation.

[12]  Alok N. Choudhary,et al.  Real-time disease surveillance using Twitter data: demonstration on flu and cancer , 2013, KDD.

[13]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[15]  Hua Wang,et al.  Block Bayesian Sparse Topical Coding , 2018, 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD)).

[16]  Devin Gaffney #iranElection: quantifying online activism , 2010 .

[17]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[18]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[19]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[20]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[21]  Aron Culotta,et al.  Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages , 2012, Language Resources and Evaluation.

[22]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[23]  Yanchun Zhang,et al.  Bayesian Sparse Topical Coding , 2019, IEEE Transactions on Knowledge and Data Engineering.

[24]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[25]  Yanchun Zhang,et al.  Collaborative Topic Ranking: Leveraging Item Meta-Data for Sparsity Reduction , 2015, AAAI.

[26]  Xiuzhen Zhang,et al.  A probabilistic method for emerging topic tracking in Microblog stream , 2016, World Wide Web.

[27]  Mark Dredze,et al.  Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models , 2013, NAACL.

[28]  Lichi Yuan,et al.  Improvement for the automatic part-of-speech tagging based on hidden Markov model , 2010, 2010 2nd International Conference on Signal Processing Systems.

[29]  Patty Kostkova,et al.  Early Warning and Outbreak Detection Using Social Networking Websites: The Potential of Twitter , 2009, eHealth.

[30]  Caroline Brun,et al.  NLP-based feature extraction for automated tweet classification , 2014 .

[31]  Harith Alani,et al.  On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter , 2014, LREC.

[32]  A. Bruns,et al.  #Ausvotes: How twitter covered the 2010 Australian federal election , 2011 .

[33]  José Ranilla,et al.  Measures of Rule Quality for Feature Selection in Text Categorization , 2003, IDA.

[34]  J. Brownstein,et al.  Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. , 2012, The American journal of tropical medicine and hygiene.

[35]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[36]  Sunmoo Yoon,et al.  What can we learn about the Ebola outbreak from tweets? , 2015, American journal of infection control.

[37]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[38]  Yanchun Zhang,et al.  A Topic Model Based on Poisson Decomposition , 2017, CIKM.

[39]  L. Sorensen User managed trust in social networking - Comparing Facebook, MySpace and Linkedin , 2009, 2009 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology.

[40]  Nathan K. Cobb,et al.  Online Social Networks and Smoking Cessation: A Scientific Research Agenda , 2011, Journal of medical Internet research.

[41]  Andrea Esuli,et al.  SentiWordNet: A High-Coverage Lexical Resource for Opinion Mining , 2006 .

[42]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[43]  Yanchun Zhang,et al.  Mining Event-Oriented Topics in Microblog Stream with Unsupervised Multi-View Hierarchical Embedding , 2018, ACM Trans. Knowl. Discov. Data.

[44]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..