Detecting influenza states based on hybrid model with personal emotional factors from social networks

In this paper, we exhibit how social media data can be used to detect and analyze real-word phenomena with several data mining techniques. We investigate the real-time flu detection problem and propose a flu state detection model with personal emotional factors and semantic information (Em-Flu model). First, we extract flu-related microblog posts automatically in real-time using a hybrid model composed by Support Vector Machine with features extracted from Restricted Boltzmann Machine. In order to overcome the limitation of 140 words for posts, expect for sentiment related features, association semantic rules are also adopted as additional features, such as bag of words, negative words, degree adverbs and sentiment words dictionary. For flu state detection at specific location, we propose an unsupervised model based on personal emotional factors to figure out what state of flu in specific place. For comparison, a supervised model is also built by adopting Conditional Random Fields to decide whether a poster has "really" catch flu and what influenza stage the poster is in. Some statistic methods and prior rules are adopted in supervised model to get the flu state of specific locations by counting the number of microblog posts in different flu states. By considering personal emotional factors, spatial features and temporal patterns of influenza, the performance of unsupervised and supervised models are both improved. The system could tell when and where influenza epidemic is more likely to occur at certain time in specific locations. In different experiments results, the hybrid models show robustness and effectiveness than state-of-the-art unsupervised and supervised model only considering the number of posts.

[1]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Hanna M. Wallach,et al.  Conditional Random Fields: An Introduction , 2004 .

[4]  Michael M. Wagner,et al.  Telephone Triage: A Timely Data Source for Surveillance of Influenza-like Diseases , 2003, AMIA.

[5]  Antonio López-Quílez,et al.  Bayesian Markov switching models for the early detection of influenza epidemics , 2008, Statistics in medicine.

[6]  S. Magruder Evaluation of Over-the-Counter Pharmaceutical Sales As a Possible Early Warning Indicator of Human Disease , 2003 .

[7]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[8]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[9]  Meng Wang,et al.  Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews , 2011, ACL.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Meng Wang,et al.  Enhancing news organization for convenient retrieval and browsing , 2013, ACM Trans. Multim. Comput. Commun. Appl..

[12]  Yue Gao,et al.  View-Based Discriminative Probabilistic Modeling for 3D Object Retrieval and Recognition , 2013, IEEE Transactions on Image Processing.

[13]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[14]  WangMeng,et al.  When Amazon Meets Google , 2013 .

[15]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[16]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[17]  Christoph Aschwanden,et al.  Spatial simulation model for infectious viral diseases with focus on SARS and the common flu , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[18]  Claire Cardie,et al.  Early Stage Influenza Detection from Twitter , 2013, ArXiv.

[19]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[20]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[21]  Xiao Sun,et al.  A MMSM-based Hybrid Method for Chinese MicroBlog Word Segmentation , 2012, CIPS-SIGHAN.

[22]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[23]  Nello Cristianini,et al.  Flu Detector - Tracking Epidemics on Twitter , 2010, ECML/PKDD.

[24]  Hai Zhao,et al.  A Unified Character-Based Tagging Framework for Chinese Word Segmentation , 2010, TALIP.

[25]  Benyuan Liu,et al.  Online Social Networks Flu Trend Tracker: A Novel Sensory Approach to Predict Flu Trends , 2012, BIOSTEC.

[26]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[27]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[28]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[29]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[30]  Xuelong Li,et al.  Event-Based Media Enrichment Using an Adaptive Probabilistic Hypergraph Model , 2015, IEEE Transactions on Cybernetics.

[31]  Yue Gao,et al.  When Amazon Meets Google: Product Visualization by Exploring Multiple Web Sources , 2013, TOIT.

[32]  Meng Wang,et al.  Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews , 2011, EMNLP.

[33]  Henry A. Kautz,et al.  Predicting Disease Transmission from Geo-Tagged Micro-Blog Data , 2012, AAAI.

[34]  Andy Blackburn,et al.  Google Flu trends , 2008 .

[35]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[36]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[37]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[38]  A. Dugas,et al.  Google Flu Trends: correlation with emergency department influenza rates and crowding metrics. , 2011, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[39]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[40]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[41]  Meng Wang,et al.  Visual Classification by ℓ1-Hypergraph Modeling , 2015, IEEE Trans. Knowl. Data Eng..

[42]  Zhoujun Li,et al.  Video recommendation over multiple information sources , 2012, Multimedia Systems.

[43]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.