We present a multi-lingual dictionary of dirty words. We have collected about 3,200 dirty words in several languages and built a database of these. The language with the most words in the database is English, though there are several hundred dirty words in for instance Japanese too. Words are classified into their general meaning, such as what part of the human anatomy they refer to. Words can also be assigned a nuance label to indicate if it is a cute word used when speaking to children, a very rude word, a clinical word etc. The database is available online and will hopefully be enlarged over time. It has already been used in research on for instance automatic joke generation and emotion detection.
[1]
Carlo Strapparava,et al.
Making Computers Laugh: Investigations in Automatic Humor Recognition
,
2005,
HLT.
[2]
Kenji Araki,et al.
What is poorly Said is a Little Funny
,
2008,
LREC.
[3]
James Breen.
Building An Electronic Japanese-English Dictionary
,
1995
.
[4]
Ellen Spertus,et al.
Smokey: Automatic Recognition of Hostile Messages
,
1997,
AAAI/IAAI.
[5]
Kenji Araki,et al.
Recognizing Humor Without Recognizing Meaning
,
2007,
WILF.
[6]
Pawel Dybala,et al.
Lexical analisis of emotiveness in utterances for automatic joke generation (メディア工学)
,
2007
.