Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. We mined data from several public Twitter endpoints to identify content relevant to healthcare providers and public health regulatory professionals. We began by compiling content related to electronic nicotine delivery systems (or e-cigarettes) as these had become popular alternatives to tobacco products. There was an apparent need to remove high frequency tweeting entities, called bots, that would spam messages, advertisements, and fabricate testimonials. Algorithms were constructed using natural language processing and machine learning to sift human responses from automated accounts with high degrees of accuracy. We found the average hyperlink per tweet, the average character dissimilarity between each individual’s content, as well as the rate of introduction of unique words were valuable attributes in identifying automated accounts. We performed a 10-fold Cross Validation and measured performance of each set of tweet features, at various bin sizes, the best of which performed with 97% accuracy. These methods were used to isolate automated content related to the advertising of electronic cigarettes. A rich taxonomy of automated entities, including robots, cyborgs, and spammers, each with different measurable linguistic features were categorized. Electronic cigarette related posts were classified as automated or organic and content was investigated with a hedonometric sentiment analysis. The overwhelming majority (≈ 80%) were automated, many of which were commercial in nature. Others used false testimonials that were sent directly to individuals as a personalized form of targeted marketing. Many tweets advertised nicotine vaporizer fluid (or e-liquid) in various “kid-friendly” flavors including ‘Fudge Brownie’, ‘Hot Chocolate’, ‘Circus Cotton Candy’ along with every imaginable flavor of fruit, which were long ago banned for traditional tobacco products. Others offered free trials, as well as incentives to retweet and spread the post among their own network. Free prize giveaways were also hosted whose raffle tickets were issued for sharing their tweet. Due to the large youth presence on the public social media platform, this was evidence that the marketing of electronic cigarettes needed considerable regulation. Twitter has since officially banned all electronic cigarette advertising on their platform. Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. We have studied several active cancer patient populations, discussing their experiences with the disease as well as survivor-ship. We experimented with a Convolutional Neural Network (CNN) as well as logistic regression to classify tweets as patient related. This led to a sample of 845 breast cancer survivor accounts to study, over 16 months. We found positive sentiments regarding patient treatment, raising support, and spreading awareness. A large portion of negative sentiments were shared regarding political legislation that could result in loss of coverage of their healthcare. We refer to these online public testimonies as “Invisible Patient Reported Outcomes” (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-reporting. Our methods can be readily applied interdisciplinary to obtain insights into a particular group of public opinions. Capturing iPROs and public sentiments from online communication can help inform healthcare professionals and regulators, leading to more connected and personalized treatment regimens. Social listening can provide valuable insights into public health surveillance strategies.
[1]
V. S. Subrahmanian,et al.
Using sentiment to detect bots on Twitter: Are humans more opinionated than bots?
,
2014,
2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).
[2]
Kyumin Lee,et al.
Devils, Angels, and Robots: Tempting Destructive Users in Social Media
,
2010,
ICWSM.
[3]
Kyumin Lee,et al.
The social honeypot project: protecting online communities from spammers
,
2010,
WWW '10.
[4]
Christopher M. Danforth,et al.
Vaporous Marketing: Uncovering Pervasive Electronic Cigarette Advertisements on Twitter
,
2015,
PloS one.
[5]
Kyumin Lee,et al.
Uncovering social spammers: social honeypots + machine learning
,
2010,
SIGIR.
[6]
Michael J. Paul,et al.
National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic
,
2013,
PloS one.
[7]
Sushil Jajodia,et al.
Who is tweeting on Twitter: human, bot, or cyborg?
,
2010,
ACSAC '10.
[8]
Jun Hu,et al.
Detecting and characterizing social spam campaigns
,
2010,
IMC '10.
[9]
S. Emery,et al.
A cross-sectional examination of marketing of electronic cigarettes on Twitter
,
2014,
Tobacco Control.
[10]
Sushil Jajodia,et al.
Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?
,
2012,
IEEE Transactions on Dependable and Secure Computing.
[11]
Kevin Borders,et al.
Social networks and context-aware spam
,
2008,
CSCW.
[12]
Virgílio A. F. Almeida,et al.
Detecting Spammers on Twitter
,
2010
.
[13]
Filippo Menczer,et al.
The rise of social bots
,
2014,
Commun. ACM.