Tweet Classification Using Sentiment Analysis Features and TF-IDF Weighting for Improved Flu Trend Detection

Social Networking Sites (SNS) such as Twitter are widely used by users of diverse ages. The rate of the data in SNS has made it become an efficient resource for real-time analysis. Thus, SNS data can effectively be used to track disease outbreaks and provide necessary warnings earlier than official agencies such as the American Center of Disease Control and Prevention. In this study, we show that sentiment analysis features and weighting techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) can improve the accuracy of flu tweet classification. Various machine learning algorithms were evaluated to classify tweets to either flu-related or unrelated and then adopt the one with better accuracy. The results show that the proposed approach is useful for flu disease surveillance models/systems.

[1]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[2]  Ed H. Chi,et al.  Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network , 2010, 2010 IEEE Second International Conference on Social Computing.

[3]  Sérgio Matos,et al.  Analysing Twitter and web queries for flu trend prediction , 2014, Theoretical Biology and Medical Modelling.

[4]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[5]  Nalini Venkatasubramanian,et al.  Social media alert and response to threats to citizens (SMART-C) , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[6]  Thomas Gottron,et al.  Bad news travel fast: a content-based analysis of interestingness on Twitter , 2011, WebSci '11.

[7]  Alok N. Choudhary,et al.  Real-time disease surveillance using Twitter data: demonstration on flu and cancer , 2013, KDD.

[8]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[9]  Miad Faezipour,et al.  A review of influenza detection and prediction through social networking sites , 2018, Theoretical Biology and Medical Modelling.

[10]  Michael Scharkow,et al.  Measuring the Public Agenda using Search Engine Queries , 2011 .

[11]  Geert-Jan Houben,et al.  Semantics + filtering + search = twitcident. exploring information in social web streams , 2012, HT '12.

[12]  Edi Winarko,et al.  Event detection in social media: A survey , 2013, International Conference on ICT for Smart Society.

[13]  J. Carroll,et al.  A New Dimension of Health Care: Systematic Review of the Uses, Benefits, and Limitations of Social Media for Health Communication , 2013, Journal of medical Internet research.

[14]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[15]  Axel Bruns,et al.  Twitter archives and the challenges of "Big Social Data" for media and communication research , 2012 .

[16]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[17]  Janet M. Thornton,et al.  Living longer by dieting: analysis of transcriptional response after caloric restriction , 2005, BMC Bioinformatics.

[18]  Olga Baysal,et al.  Mining Twitter Data for Influenza Detection and Surveillance , 2016, 2016 IEEE/ACM International Workshop on Software Engineering in Healthcare Systems (SEHS).

[19]  Geert-Jan Houben,et al.  Twitcident: fighting fire with information from social web streams , 2012, WWW.

[20]  S. Merrill,et al.  A patient-specific treatment model for Graves’ hyperthyroidism , 2018, Theoretical Biology and Medical Modelling.

[21]  Masaru Kitsuregawa,et al.  Visual fusion of mega-city big data: An application to traffic and tweets data analysis of Metro passengers , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[22]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[23]  Shen Zhang,et al.  Using Twitter to Enhance Traffic Incident Awareness , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[24]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[25]  Jie Zhang,et al.  Estimating Mobile Traffic Demand Using Twitter , 2016, IEEE Wireless Communications Letters.

[26]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[27]  Shaowen Wang,et al.  Spatiotemporal transformation of social media geostreams: a case study of Twitter for flu risk analysis , 2013, IWGS '13.

[28]  Vishal Gupta,et al.  A systematic review of text stemming techniques , 2016, Artificial Intelligence Review.

[29]  Miles Osborne,et al.  The Edinburgh Twitter Corpus , 2010, HLT-NAACL 2010.

[30]  Stephen Wan,et al.  Social Media Data Aggregation and Mining for Internet-Scale Customer Relationship Management , 2015, 2015 IEEE International Conference on Information Reuse and Integration.

[31]  Armin R. Mikler,et al.  Text and Structural Data Mining of Influenza Mentions in Web and Social Media , 2010, International journal of environmental research and public health.

[32]  Akira Fukuda,et al.  Hot topic detection in local areas using Twitter and Wikipedia , 2012, ARCS 2012.

[33]  Dominique Genoud,et al.  Mining and Visualizing Social Data to Inform Marketing Decisions , 2016, 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA).

[34]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[35]  Xiao Wang,et al.  Using Web data to enhance traffic situation awareness , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[36]  Erwin Adi,et al.  Harvesting real time traffic information from Twitter , 2012 .

[37]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[38]  Hanan Samet,et al.  Identification of live news events using Twitter , 2011, LBSN '11.

[39]  Abdallah Qusef,et al.  Social Media in project communications management , 2016, 2016 7th International Conference on Computer Science and Information Technology (CSIT).

[40]  A. Dugas,et al.  Google Flu Trends: correlation with emergency department influenza rates and crowding metrics. , 2011, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[41]  R.J.P. Stronkman,et al.  Towards a realtime Twitter analysis during crises for operational crisis management , 2012, ISCRAM.

[42]  Cheng Hu,et al.  Chinese Social Media Analysis for Disease Surveillance , 2014, IIKI.