Slang Detection and Identification

The prevalence of informal language such as slang presents challenges for natural language systems, particularly in the automatic discovery of flexible word usages. Previous work has explored slang in terms of dictionary construction, sentiment analysis, word formation, and interpretation, but scarce research has attempted the basic problem of slang detection and identification. We examine the extent to which deep learning methods support automatic detection and identification of slang from natural sentences using a combination of bidirectional recurrent neural networks, conditional random field, and multilayer perceptron. We test these models based on a comprehensive set of linguistic features in sentence-level detection and token-level identification of slang. We found that a prominent feature of slang is the surprising use of words across syntactic categories or syntactic shift (e.g., verb-noun). Our best models detect the presence of slang at the sentence level with an F1-score of 0.80 and identify its exact position at the token level with an F1-Score of 0.50.

[1]  Tom Dalzell The Vulgar Tongue: Green’s History of Slang by Jonathon Green (review) , 2015 .

[2]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[3]  Pinar Karagoz,et al.  Detecting User Emotions in Twitter through Collective Classification , 2016 .

[4]  Yulia Tsvetkov,et al.  Incorporating Dialectal Variability for Socially Equitable Language Identification , 2017, ACL.

[5]  Pushpak Bhattacharyya,et al.  SlangNet: A WordNet like resource for English Slang , 2016, LREC.

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Alok Ranjan Pal,et al.  Detection of Slang Words in e-Data using semi-Supervised Learning , 2017, ArXiv.

[8]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[9]  Huan Liu,et al.  SlangSD: building, expanding and using a sentiment dictionary of slang words for short-text sentiment classification , 2018, Lang. Resour. Evaluation.

[10]  Devendra K. Tayal,et al.  SLANGZY: a fuzzy logic-based algorithm for English slang meaning selection , 2018, Progress in Artificial Intelligence.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[13]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[14]  Karsten Berns,et al.  Kernel Multilayer Perceptron , 2011, 2011 24th SIBGRAPI Conference on Graphics, Patterns and Images.

[15]  William Yang Wang,et al.  Learning to Explain Non-Standard English Words and Phrases , 2017, IJCNLP.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[18]  Pinar Senkul,et al.  Detecting User Emotions in Twitter through Collective Classification , 2016, KDIR.

[19]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[20]  S. Aji,et al.  Document summarization using positive pointwise mutual information , 2012, ArXiv.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.