Ontology-driven aspect-based sentiment analysis classification: An infodemiological case study regarding infectious diseases in Latin America

Abstract Infodemiology is the process of mining unstructured and textual data so as to provide public health officials and policymakers with valuable information regarding public health. The appearance of this new data source, which was previously unimaginable, has opened up a new way in which to improve public health systems, resulting in better communication policies and better detection systems. However, the unstructured nature of the Internet, along with the complexity of the infectious disease domain, prevents the information extracted from being easily understood. Moreover, when dealing with languages other than English, for which some of the most common Natural Language Processing resources are not available, the correct exploitation of this data becomes even more difficult. We intend to fill these gaps proposing an ontology-driven aspect-based sentiment analysis with which to measure the general public’s opinions as regards infectious diseases when expressed in Spanish by employing a case study of tweets concerning the Zika, Dengue and Chikungunya viruses in Latin America. Our proposal is based on two technologies. We first use ontologies in order to model the infectious disease domain with concepts such as risks, symptoms, transmission methods or drugs, among other concepts. We then measure the relationship between these concepts in order to determine the degree to which one concept influences other concepts. This new information is subsequently applied in order to build an aspect-based sentiment analysis model based on statistical and linguistic features. This is done by applying deep-learning models. Our proposal is available on a web platform, where users can see the sentiment for each concept at a glance and analyse how each concept influences the sentiment of the others.

[1]  Abhishek Kumar,et al.  A Multilayer Perceptron based Ensemble Technique for Fine-grained Financial Sentiment Analysis , 2017, EMNLP.

[2]  Duncan J. Watts,et al.  The Structural Virality of Online Diffusion , 2015, Manag. Sci..

[3]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[4]  Barry Smith,et al.  Infectious Disease Ontology , 2010 .

[5]  Miguel Ángel Rodríguez-García,et al.  Ontology-based annotation and retrieval of services in the cloud , 2014, Knowl. Based Syst..

[6]  Salas-ZrateMara del Pilar,et al.  Automatic detection of satire in Twitter , 2017 .

[7]  Marcel Salathé,et al.  The dynamics of health behavior sentiments on a large online social network , 2012, EPJ Data Science.

[8]  Balasubramanian Raman,et al.  Combining Neural, Statistical and External Features for Fake News Stance Identification , 2018, WWW.

[9]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[10]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[11]  Arun Kumar Sangaiah,et al.  Aspect based sentiment analysis by a linguistically regularized CNN with gated mechanism , 2019, J. Intell. Fuzzy Syst..

[12]  Nagendra Kumar,et al.  Aspect ontology based review exploration , 2018, Electron. Commer. Res. Appl..

[13]  Andrew Lynch,et al.  Multilingual Miami: Current Trends in Sociolinguistic Research , 2015, Lang. Linguistics Compass.

[14]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[15]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[16]  Erik Cambria,et al.  Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM , 2018, AAAI.

[17]  Gang Liu,et al.  Bidirectional LSTM with attention mechanism and convolutional layer for text classification , 2019, Neurocomputing.

[18]  Erik Cambria,et al.  OntoSenticNet: A Commonsense Ontology for Sentiment Analysis , 2018, IEEE Intelligent Systems.

[19]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[20]  Mohammad Faidzul Nasrudin,et al.  Experimental Approach Based on Ensemble and Frequent Itemsets Mining for Image Spam Filtering , 2018 .

[21]  John G. Breslin,et al.  INSIGHT-1 at SemEval-2016 Task 5: Deep Learning for Multilingual Aspect-based Sentiment Analysis , 2016, *SEMEVAL.

[22]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[23]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[24]  Rafael Valencia-García,et al.  Review of English literature on figurative language applied to social networks , 2019, Knowledge and Information Systems.

[25]  José Medina-Moreira,et al.  Evaluating Information-Retrieval Models and Machine-Learning Classifiers for Measuring the Social Perception towards Infectious Diseases , 2019, Applied Sciences.

[26]  James W. Pennebaker,et al.  The Psychology of Word Use in Depression Forums in English and in Spanish: Texting Two Text Analytic Approaches , 2008, ICWSM.

[27]  Miguel Ángel Rodríguez-García,et al.  Sentiment Analysis on Tweets about Diabetes: An Aspect-Level Approach , 2017, Comput. Math. Methods Medicine.

[28]  David Robinson,et al.  Hate Speech Detection on Twitter: Feature Engineering v.s. Feature Selection , 2018, ESWC.

[29]  Elisabetta Fersini,et al.  Expressive signals in social media languages to improve polarity detection , 2016, Inf. Process. Manag..

[30]  Degui Zhi,et al.  Social media and outbreaks of emerging infectious diseases: A systematic review of literature , 2018, American Journal of Infection Control.

[31]  Miguel Ángel García Cumbreras,et al.  Overview of TASS 2019: One More Further for the Global Spanish Sentiment Analysis Corpus , 2019, IberLEF@SEPLN.

[32]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[33]  Namita Mittal,et al.  Machine Learning Approach for Sentiment Analysis , 2016 .

[34]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[35]  S. Mohapatra,et al.  : DISEASE ONTOLOGY , 2014 .

[36]  Helen Christensen,et al.  A Linguistic Analysis of Suicide-Related Twitter Posts , 2017, Crisis.

[37]  David Bamman,et al.  Contextualized Sarcasm Detection on Twitter , 2015, ICWSM.

[38]  Gunther Eysenbach,et al.  Infodemiology and infoveillance tracking online health information and cyberbehavior for public health. , 2011, American journal of preventive medicine.

[39]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[40]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[41]  Flavius Frasincar,et al.  A Hybrid Approach for Aspect-Based Sentiment Analysis Using Deep Contextual Word Embeddings and Hierarchical Attention , 2020, ICWE.

[42]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[43]  Francisco M. Couto,et al.  Semantic Similarity Definition , 2019, Encyclopedia of Bioinformatics and Computational Biology.

[44]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[45]  Christo Kirov,et al.  A Language-Independent Feature Schema for Inflectional Morphology , 2015, ACL.

[46]  Hossam Faris,et al.  Feature engineering for detecting spammers on Twitter: Modelling and analysis , 2018, J. Inf. Sci..

[47]  Melissa Leach,et al.  BRIEFING: EBOLA – MYTHS, REALITIES, AND STRUCTURAL VIOLENCE , 2015 .

[48]  Anat Gesser-Edelsburg,et al.  What does the public know about Ebola? The public's risk perceptions regarding the current Ebola outbreak in an as-yet unaffected country. , 2015, American journal of infection control.

[49]  G. Eysenbach Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet , 2009, Journal of medical Internet research.

[50]  Nisha P. Shetty,et al.  N-Gram Assisted Youtube Spam Comment Detection , 2018 .

[51]  Moshe Koppel,et al.  THE IMPORTANCE OF NEUTRAL EXAMPLES FOR LEARNING SENTIMENT , 2006, Comput. Intell..

[52]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[53]  Namita Mittal,et al.  Prominent Feature Extraction for Sentiment Analysis , 2015, Socio-Affective Computing.

[54]  Steve R. Gunn,et al.  Identifying Feature Relevance Using a Random Forest , 2005, SLSFS.

[55]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[56]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[57]  Tao Gu,et al.  Ontology based context modeling and reasoning using OWL , 2004, IEEE Annual Conference on Pervasive Computing and Communications Workshops, 2004. Proceedings of the Second.

[58]  Petter Törnberg,et al.  Echo chambers and viral misinformation: Modeling fake news as complex contagion , 2018, PloS one.

[59]  Miguel Ángel Rodríguez-García,et al.  Automatic detection of satire in Twitter: A psycholinguistic-based approach , 2017, Knowl. Based Syst..

[60]  Asunción Gómez-Pérez,et al.  METHONTOLOGY: From Ontological Art Towards Ontological Engineering , 1997, AAAI 1997.

[61]  Patrick Vinck,et al.  Institutional trust and misinformation in the response to the 2018-19 Ebola outbreak in North Kivu, DR Congo: a population-based survey. , 2019, The Lancet. Infectious diseases.

[62]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[63]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[64]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[65]  Vivek K. Singh,et al.  Toward Multimodal Cyberbullying Detection , 2017, CHI Extended Abstracts.

[66]  Igor Mozetic,et al.  Multilingual Twitter Sentiment Classification: The Role of Human Annotators , 2016, PloS one.

[67]  Gong Ling,et al.  An improved TF-IDF approach for text classification , 2005 .

[68]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[69]  Yu-N Cheah,et al.  Improving Aspect Extraction Using Aspect Frequency and Semantic Similarity-Based Approach for Aspect-Based Sentiment Analysis , 2017, IC2IT.

[70]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[71]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[72]  Noha S. Tawfik,et al.  Evaluating sentence representations for biomedical text: Methods and experimental results , 2020, J. Biomed. Informatics.

[73]  Miguel Ángel García Cumbreras,et al.  Overview of TASS 2018: Opinions, Health and Emotions , 2018, TASS@SEPLN.

[74]  Ganggao Zhu,et al.  Computing Semantic Similarity of Concepts in Knowledge Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[75]  Hongning Wang,et al.  ReviewMiner: An Aspect-based Review Analytics System , 2017, SIGIR.

[76]  Samih J. Nassif,et al.  The Índice Flesch‐Szigriszt and Spanish Lexile Analyzer to evaluate Spanish patient education materials in otolaryngology , 2018, The Laryngoscope.

[77]  Rizal Setya Perdana What is Twitter , 2013 .

[78]  Yuzhou Zhang,et al.  Predicting seasonal influenza epidemics using cross-hemisphere influenza surveillance data and local internet query data , 2019, Scientific Reports.

[79]  Chris Hankin,et al.  Real-time processing of social media with SENTINEL: A syndromic surveillance system incorporating deep learning for health classification , 2019, Inf. Process. Manag..

[80]  Julii Brainard,et al.  Misinformation making a disease outbreak worse: outcomes compared for influenza, monkeypox, and norovirus , 2019, Simul..

[81]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[82]  Kim Schouten,et al.  Survey on Aspect-Level Sentiment Analysis , 2016, IEEE Transactions on Knowledge and Data Engineering.

[83]  Zhiming Zhao,et al.  Unsupervised Approaches for Textual Semantic Annotation, A Survey , 2019, ACM Comput. Surv..

[84]  J. Fernando Sánchez-Rada,et al.  Social context in sentiment analysis: Formal definition, overview of current trends and framework for comparison , 2019, Inf. Fusion.

[85]  David Vilares,et al.  LyS at TASS 2015: Deep Learning Experiments for Sentiment Analysis on Spanish Tweets , 2015, TASS@SEPLN.

[86]  Huaglory Tianfield,et al.  Sentiment analysis via multi-layer perceptron trained by meta-heuristic optimisation , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[87]  L. Schuler‐Faccini,et al.  Possible Association Between Zika Virus Infection and Microcephaly - Brazil, 2015. , 2016, MMWR. Morbidity and mortality weekly report.

[88]  Anna I. Noskova,et al.  The Problem of National and Cultural Semantics of Lexical Units in Spanish (On Material of Venezuelan And Nicaraguan Words Reflecting Forms of Work) , 2017 .

[89]  Ahmet Ertugan,et al.  Applying fuzzy logic for sentiment analysis of social media network data in marketing , 2017 .

[90]  F. Bazzoli,et al.  Attitudes of Crohn’s Disease Patients: Infodemiology Case Study and Sentiment Analysis of Facebook and Twitter Posts , 2017, JMIR public health and surveillance.

[91]  Holger Knublauch,et al.  The Protégé OWL Plugin: An Open Development Environment for Semantic Web Applications , 2004, SEMWEB.

[92]  M. Geetha,et al.  Relationship between customer sentiment and online customer ratings for hotels - An empirical analysis , 2017 .

[93]  Amlan Chakrabarti,et al.  A Mixed approach of Deep Learning method and Rule-Based method to improve Aspect Level Sentiment Analysis , 2020 .

[94]  Antonio Ruiz-Martínez,et al.  Feature-based opinion mining in financial news: An ontology-driven approach , 2017, J. Inf. Sci..

[95]  Yu-N Cheah,et al.  A two-fold rule-based model for aspect extraction , 2017, Expert Syst. Appl..