Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products

Background Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users’ levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes. Objective To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes. Methods We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naïve Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns. Results The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phihookah-positive=0.39; phie-cigs-positive=0.19); correlations between search keywords and sentiment (χ2 4=414.50, P<.001, Cramer’s V=0.36), and the most discriminating unigram features for positive and negative sentiment ranked by log odds ratio in the machine learning component of the study. In the automated classification tasks, SVMs using a relatively small number of unigram features (500) achieved best performance in discriminating tobacco-related from unrelated tweets (F score=0.85). Conclusions Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment. This positive sentiment is correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes. Several apparent perceptual disconnects between these products and their health effects suggest opportunities for tobacco control education. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, yielding an improved signal-to-noise ratio in Twitter data and paving the way for automated tobacco surveillance applications.

[1]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[2]  Kimberly A. Neuendorf,et al.  The Content Analysis Guidebook , 2001 .

[3]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Tanja Popovic,et al.  Annual smoking-attributable mortality, years of potential life lost, and productivity losses--United States, 1997-2001. , 2005, MMWR. Morbidity and mortality weekly report.

[6]  B. Curbow,et al.  Harm perception of nicotine products in college freshmen. , 2007, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[7]  J. M. Rogers,et al.  Tobacco and pregnancy: overview of exposures and effects. , 2008, Birth defects research. Part C, Embryo today : reviews.

[8]  Thomas Eissenberg,et al.  Waterpipe tobacco and cigarette smoking: direct comparison of toxicant exposure. , 2009, American journal of preventive medicine.

[9]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[10]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[11]  G. Eysenbach Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet , 2009, Journal of medical Internet research.

[12]  E. Larson,et al.  Dissemination of health information through social networks: twitter and antibiotics. , 2010, American journal of infection control.

[13]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[14]  Nigel Collier,et al.  OMG U got flu? Analysis of shared health messages for bio-surveillance , 2011, Semantic Mining in Biomedicine.

[15]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[16]  W. Maziak,et al.  Waterpipe tobacco smoking: an emerging health crisis in the United States. , 2010, American journal of health behavior.

[17]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[18]  Son Doan,et al.  An analysis of Twitter messages in the 2011 Tohoku Earthquake , 2011, eHealth.

[19]  J. Foulds,et al.  Electronic cigarettes (e‐cigs): views of aficionados and clinical/public health perspectives , 2011, International journal of clinical practice.

[20]  Christophe G. Giraud-Carrier,et al.  Identifying Health-Related Topics on Twitter - An Exploration of Tobacco-Related Tweets as a Test Topic , 2011, SBP.

[21]  N. Heaivilin,et al.  Public Health Surveillance of Dental Pain via Twitter , 2011, Journal of dental research.

[22]  James M. Leonhardt,et al.  Twitter=quitter? An analysis of Twitter quit smoking social networks , 2011, Tobacco Control.

[23]  Thomas Eissenberg,et al.  Waterpipe tobacco smoking and cigarette smoking: a direct comparison of toxicant exposure and subjective effects. , 2011, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[24]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[25]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[26]  J. Brownstein,et al.  Tracking the rise in popularity of electronic nicotine delivery systems (electronic cigarettes) using search query surveillance. , 2011, American journal of preventive medicine.

[27]  M. White,et al.  Increasing hookah use in California. , 2011, American journal of public health.

[28]  Becky Freeman,et al.  New media and tobacco control , 2012, Tobacco Control.

[29]  E. Grekin,et al.  Waterpipe Smoking Among College Students in the United States: A Review of the Literature , 2012, Journal of American college health : J of ACH.

[30]  Kelvin Choi,et al.  Young adults' favorable perceptions of snus, dissolvable tobacco products, and electronic cigarettes: findings from a focus group study. , 2012, American journal of public health.

[31]  Michael D. Barnes,et al.  Temporal variability of problem drinking on Twitter , 2012 .

[32]  Margaret Barnes,et al.  Investigating the use of social media to help women from going back to smoking post‐partum , 2012, Australian and New Zealand journal of public health.

[33]  V. Rice Water pipe smoking among the young: the rebirth of an old tradition. , 2012, The Nursing clinics of North America.

[34]  A. Brenner Twitter Use 2012 , 2012 .

[35]  Options for state and local governments to regulate non-cigarette tobacco products. , 2012, Annals of health law.

[36]  M. Fine,et al.  US health policy related to hookah tobacco smoking. , 2012, American journal of public health.

[37]  L. Kux OF HEALTH AND HUMAN SERVICES Food and Drug Administration , 2014 .