Linguistic Taboos and Euphemisms in Nepali

Languages across the world have words, phrases, and behaviors -- the taboos -- that are avoided in public communication considering them as obscene or disturbing to the social, religious, and ethical values of society. However, people deliberately use these linguistic taboos and other language constructs to make hurtful, derogatory, and obscene comments. It is nearly impossible to construct a universal set of offensive or taboo terms because offensiveness is determined entirely by different factors such as socio-physical setting, speaker-listener relationship, and word choices. In this paper, we present a detailed corpus-based study of offensive language in Nepali. We identify and describe more than 18 different categories of linguistic offenses including politics, religion, race, and sex. We discuss 12 common euphemisms such as synonym, metaphor and circumlocution. In addition, we introduce a manually constructed data set of over 1000 offensive and taboo terms popular among contemporary speakers. This in-depth study of offensive language and resource will provide a foundation for several downstream tasks such as offensive language detection and language learning.

[1]  A SOCIOLINGUISTIC VIEW OF LINGUISTIC TABOOS AND EUP HEMISTIC STRATEGIES IN THE ALGERIAN SOCIETY : ATTITUDES AND BELIEFS IN TLEMCEN SPEECH COMMUNITY , 2014 .

[2]  木村 和夫 Pragmatics , 1997, Language Teaching.

[3]  Pascale Fung,et al.  One-step and Two-step Classification for Abusive Language Detection on Twitter , 2017, ALW@ACL.

[4]  Priyatno Ardi,et al.  SWEAR WORDS IN BAD BOYS II: A SEMANTIC ANALYSIS , 2018, LLT Journal: A Journal on Language and Language Teaching.

[5]  Nada Qanbar A Sociolinguistic Study of The linguistic Taboos in the Yemeni Society , 2011 .

[6]  Robin M. Kowalski,et al.  Cyberbullying Via Social Media , 2015 .

[7]  Timothy Jay,et al.  The pragmatics of swearing , 2008 .

[8]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[9]  Timothy B. Jay The Utility and Ubiquity of Taboo Words , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[10]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[11]  Peter K. Smith,et al.  The Nature of Cyberbullying and What We Can Do about It. , 2015 .

[12]  T. Stone,et al.  What’s the bloody law on this? Nurses, swearing, and the law in New South Wales, Australia , 2010, Contemporary nurse.

[13]  Felice Dell'Orletta,et al.  Hate Me, Hate Me Not: Hate Speech Detection on Facebook , 2017, ITASEC.

[14]  Chunming Gao A Sociolinguistic Study of English Taboo Language , 2013 .

[15]  Michael Wiegand,et al.  Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language , 2018 .

[16]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[17]  A. D. Shveĭt︠s︡er,et al.  Introduction to sociolinguistics , 1986 .

[18]  Elizabeth F. Churchill,et al.  Automatic identification of personal insults on social news sites , 2012, J. Assoc. Inf. Sci. Technol..

[19]  LaShel Shaw,et al.  Hate Speech in Cyberspace: Bitterness without Boundaries , 2012 .

[20]  Mulyadi,et al.  Linguistic Taboos in Karonese Culture , 2018 .

[21]  Alexander F. Gelbukh,et al.  Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling , 2018, TRAC@COLING 2018.

[22]  Xavier Giró-i-Nieto,et al.  Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation , 2019, ArXiv.

[23]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[24]  K. Burger Forbidden Words Taboo And The Censoring Of Language , 2016 .

[25]  Jun-Ming Xu,et al.  The five W's of "bullying" on Twitter: Who, What, Why, Where, and When , 2015, Comput. Hum. Behav..