Opinion Mining for Measuring the Social Perception of Infectious Diseases. An Infodemiology Approach

Prior to the digital era, knowing the perception of society towards the health-system was done through face-to-face questionnaires and interviews. With this knowledge, governments and public organizations have designed effective action plans in order to improve our quality of life. Nowadays, as a result of the irruption of computer networks, it is possible to reach a higher number of people with a minor cost and perform automatic analysis of the collected data. Infodemiology is the research discipline oriented to the study of health information on the Internet. In this work, we explore the reliability of Opinion Mining to measure the subjective perception of people towards infectious diseases during times of high risk of contagion. In short, linguistic characteristics, among other relevant data, were extracted from tweets written in the Spanish Language by the end of 2017 in Ecuador. The built model contains the most relevant linguistics characteristics related to determine positive and negative pieces of text regarding infectious diseases. In addition, the corpus used in this analysis has been published for other researchers to use it in future experiments in this area. The results showed Support Vector Machines achieved the best results with a precision of 86.5%.

[1]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[2]  Efstathios Stamatatos,et al.  Words versus Character n-Grams for Anti-Spam Filtering , 2007, Int. J. Artif. Intell. Tools.

[3]  Manuel Martín Serrano Le conflit entre innovation technologique et changement culturel , 1976 .

[4]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[5]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[6]  Massimiliano Di Penta,et al.  An approach to identify duplicated web pages , 2002, Proceedings 26th Annual International Computer Software and Applications.

[7]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[8]  Miguel Ángel Rodríguez-García,et al.  Automatic detection of satire in Twitter: A psycholinguistic-based approach , 2017, Knowl. Based Syst..

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Minyi Guo,et al.  Emoticon Smoothed Language Models for Twitter Sentiment Analysis , 2012, AAAI.

[11]  Miguel Ángel Rodríguez-García,et al.  Sentiment Analysis on Tweets about Diabetes: An Aspect-Level Approach , 2017, Comput. Math. Methods Medicine.

[12]  D. Jamison,et al.  Disease Control Priorities in Developing Countries , 1993 .

[13]  D. Cummings,et al.  Prediction of Dengue Incidence Using Search Query Surveillance , 2011, PLoS neglected tropical diseases.

[14]  G. Eysenbach Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet , 2009, Journal of medical Internet research.

[15]  Víctor M. Prieto,et al.  Twitter: A Good Place to Detect Health Conditions , 2014, PloS one.

[16]  Mario Andrés Paredes-Valverde,et al.  Sentiment Classification of Spanish Reviews: An Approach based on Feature Selection and Machine Learning Methods , 2016, J. Univers. Comput. Sci..

[17]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[18]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[19]  Alexander Hapfelmeier,et al.  Nonparametric Subgroup Identification by PRIM and CART: A Simulation and Application Study , 2017, Comput. Math. Methods Medicine.

[20]  Rafael Valencia-García,et al.  Machine Learning Based Sentiment Analysis on Spanish Financial Tweets , 2018, WorldCIST.