Ontology-based automatic identification of public health-related Turkish tweets

Social media analysis, such as the analysis of tweets, is a promising research topic for tracking public health concerns including epidemics. In this paper, we present an ontology-based approach to automatically identify public health-related Turkish tweets. The system is based on a public health ontology that we have constructed through a semi-automated procedure. The ontology concepts are expanded through a linguistically motivated relaxation scheme as the last stage of ontology development, before being integrated into our system to increase its coverage. The ultimate lexical resource which includes the terms corresponding to the ontology concepts is used to filter the Twitter stream so that a plausible tweet subset, including mostly public-health related tweets, can be obtained. Experiments are carried out on two million genuine tweets and promising precision rates are obtained. Also implemented within the course of the current study is a Web-based interface, to track the results of this identification system, to be used by the related public health staff. Hence, the current social media analysis study has both technical and practical contributions to the significant domain of public health.

[1]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[2]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[3]  Mark Dredze,et al.  How Social Media Will Change Public Health , 2012, IEEE Intelligent Systems.

[4]  A Vespignani,et al.  Web‐based participatory surveillance of infectious diseases: the Influenzanet participatory surveillance experience , 2013, Clinical Microbiology and Infection.

[5]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[6]  Dilek Küçük,et al.  Experiments to Improve Named Entity Recognition on Turkish Tweets , 2014, ArXiv.

[7]  A. Göksel,et al.  Turkish: A Comprehensive Grammar , 2004 .

[8]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[9]  Paola Velardi,et al.  Twitter mining for fine-grained syndromic surveillance , 2014, Artif. Intell. Medicine.

[10]  Jian Zhang,et al.  The Protein Ontology: a structured representation of protein forms and complexes , 2010, Nucleic Acids Res..

[11]  Christophe G. Giraud-Carrier,et al.  Identifying Health-Related Topics on Twitter - An Exploration of Tobacco-Related Tweets as a Test Topic , 2011, SBP.

[12]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[13]  Mariano Fernández-López,et al.  Ontological Engineering , 2003, Encyclopedia of Database Systems.

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Dilek Küçük,et al.  Semi-Automatic Construction of a Domain Ontology for Wind Energy Using Wikipedia Articles , 2014, ArXiv.

[16]  Paola Velardi,et al.  Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language , 2013, PloS one.

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[19]  Sérgio Matos,et al.  Analysing Twitter and web queries for flu trend prediction , 2014, Theoretical Biology and Medical Modelling.

[20]  Kalina Bontcheva,et al.  Microblog-genre noise and impact on semantic annotation accuracy , 2013, HT.

[21]  N. Heaivilin,et al.  Public Health Surveillance of Dental Pain via Twitter , 2011, Journal of dental research.

[22]  Ian Horrocks,et al.  Building a bioinformatics ontology using OIL , 2002, IEEE Transactions on Information Technology in Biomedicine.

[23]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[24]  Emine Küçük,et al.  Automatic identification of public health related Turkish tweets , 2016 .

[25]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[26]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.