Exploratory Analysis of Marketing and Non-marketing E-cigarette Themes on Twitter

Electronic cigarettes (e-cigs) have been gaining popularity and have emerged as a controversial tobacco product since their introduction in 2007 in the U.S. The smoke-free aspect of e-cigs renders them less harmful than conventional cigarettes and is one of the main reasons for their use by people who plan to quit smoking. The US food and drug administration (FDA) has introduced new regulations early May 2016 that went into effect on August 8, 2016. Given this important context, in this paper, we report results of a project to identify current themes in e-cig tweets in terms of semantic interpretations of topics generated with topic modeling. Given marketing/advertising tweets constitute almost half of all e-cig tweets, we first build a classifier that identifies marketing and non-marketing tweets based on a hand-built dataset of 1000 tweets. After applying the classifier to a dataset of over a million tweets (collected during 4/2015 - 6/2016), we conduct a preliminary content analysis and run topic models on the two sets of tweets separately after identifying the appropriate numbers of topics using topic coherence. We interpret the results of the topic modeling process by relating topics generated to specific e-cig themes. We also report on themes identified from e-cig tweets generated at particular places (such as schools and churches) for geo-tagged tweets found in our dataset using the GeoNames API. To our knowledge, this is the first effort that employs topic modeling to identify e-cig themes in general and in the context of geo-tagged tweets tied to specific places of interest.

[1]  Christopher C. Yang,et al.  Diffusion of Messages from an Electronic Cigarette Brand to Potential Users through Twitter , 2015, PloS one.

[2]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[3]  Mary Schwarz,et al.  Assessing Electronic Cigarette-Related Tweets for Sentiment and Content Using Supervised Machine Learning , 2015, Journal of medical Internet research.

[4]  Brian A. King,et al.  Tobacco Use Among Middle and High School Students — United States, 2011–2014 , 2015, MMWR. Morbidity and mortality weekly report.

[5]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[6]  Heather Cole-Lewis,et al.  Social Listening: A Content Analysis of E-Cigarette Discussions on Twitter , 2015, Journal of medical Internet research.

[7]  Neal L. Benowitz,et al.  Electronic Cigarettes : Not All Good News ? E-cigarette use results in suppression of immune and inflammatory-response genes in nasal epithelial cells similar to cigarette smoke , 2016 .

[8]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[9]  C. Elkan,et al.  Topic Models , 2008 .

[10]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[11]  Derek Greene,et al.  An analysis of the coherence of descriptors in topic modeling , 2015, Expert Syst. Appl..

[12]  Hayden McRobbie,et al.  E-cigarettes: an evidence update. A report commissioned by Public Health England. , 2015 .

[13]  Aron Culotta,et al.  Predicting the Demographics of Twitter Users from Website Traffic Data , 2015, AAAI.

[14]  Ben Shneiderman,et al.  TopicFlow: Visualizing topic alignment of Twitter data over time , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[15]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[16]  Annice E Kim,et al.  Using Twitter Data to Gain Insights into E-cigarette Marketing and Locations of Use: An Infoveillance Study , 2015, Journal of medical Internet research.

[17]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[18]  Ii-Lun Chen,et al.  FDA summary of adverse events on electronic cigarettes. , 2013, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[21]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[22]  S. Emery,et al.  A cross-sectional examination of marketing of electronic cigarettes on Twitter , 2014, Tobacco Control.

[23]  Ramakanth Kavuluru,et al.  On Assessing the Sentiment of General Tweets , 2015, Canadian Conference on AI.

[24]  Susan F Rudy,et al.  Electronic nicotine delivery systems: overheating, fires and explosions , 2016, Tobacco Control.

[25]  Brian A. King,et al.  Tobacco Use Among Middle and High School Students--United States, 2011-2015. , 2016, MMWR. Morbidity and mortality weekly report.

[26]  Chris Bullen,et al.  Electronic nicotine delivery systems: a research agenda , 2011, Tobacco Control.

[27]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[28]  K. Cummings,et al.  A framework for evaluating the public health impact of e-cigarettes and other vaporized nicotine products. , 2017, Addiction.

[29]  W. Chapman,et al.  Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products , 2013, Journal of medical Internet research.

[30]  Jennifer B. Unger,et al.  E-Cigarettes and Future Cigarette Use , 2016, Pediatrics.

[31]  Ramakanth Kavuluru,et al.  Convolutional neural networks for biomedical text classification: application in indexing biomedical articles , 2015, BCB.

[32]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[33]  David M. Blei,et al.  Visualizing Topic Models , 2012, ICWSM.

[34]  Ramakanth Kavuluru,et al.  Toward automated e-cigarette surveillance: Spotting e-cigarette proponents on Twitter , 2016, J. Biomed. Informatics.

[35]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36]  Elizabeth G Klein,et al.  Online E-cigarette Marketing Claims: A Systematic Content and Legal Analysis. , 2016, Tobacco regulatory science.

[37]  Cornelia Caragea,et al.  An Analysis of Twitter Data on E-cigarette Sentiments and Promotion , 2015, AIME.