Providing a convenient mechanism for accessing the Internet, smartphones have led to the rapid growth of Social Networking Services (SNSs) such as Twitter and have served as a major platform for SNSs. Nowadays, people are able to check conveniently the SNS messages posted by their friends and followers via their smartphones. As a consequence, people are exposed to spoilers of TV programs that they follow. So far, there are two previous works that explored the detection of spoilers in texts, not SNS: (1) keyword matching method and (2) machine-learning method based on Latent Dirichlet Allocation (LDA). The keyword matching method evaluates most tweets as spoilers; hence its poor recall performance. The other method based on LDA, although successful on large text, works poorly on short segments of text such as those found on Twitter and evaluates most tweets as non-spoilers. This paper presents four features that are significant in the classification of spoiler tweets. Using those features, we classified spoiler tweets pertaining to a reality TV show (“Dancing with the Stars”). We experimentally compared our method with previous methods, with our method achieving substantially higher precision compared to the keyword matching and LDA-based methods while maintaining comparable recalls.
[1]
Satoshi Nakamura,et al.
Temporal filtering system to reduce the risk of spoiling a user's enjoyment
,
2007,
IUI '07.
[2]
Anatole Gershman,et al.
Topical Clustering of Tweets
,
2011
.
[3]
Very Large Corpora.
Empirical Methods in Natural Language Processing
,
1999
.
[4]
Oren Etzioni,et al.
Named Entity Recognition in Tweets: An Experimental Study
,
2011,
EMNLP.
[5]
Naren Ramakrishnan,et al.
Finding the Storyteller: Automatic Spoiler Tagging using Linguistic Cues
,
2010,
COLING.
[6]
Guodong Zhou,et al.
N-gram-based Tense Models for Statistical Machine Translation
,
2012,
EMNLP.