Detecting Denial-of-Service Attacks from Social Media Text: Applying NLP to Computer Security

This paper describes a novel application of NLP models to detect denial of service attacks using only social media as evidence. Individual networks are often slow in reporting attacks, so a detection system from public data could better assist a response to a broad attack across multiple services. We explore NLP methods to use social media as an indirect measure of network service status. We describe two learning frameworks for this task: a feed-forward neural network and a partially labeled LDA model. Both models outperform previous work by significant margins (20% F1 score). We further show that the topic-based model enables the first fine-grained analysis of how the public reacts to ongoing network attacks, discovering multiple “stages” of observation. This is the first model that both detects network attacks (with best performance) and provides an analysis of when and how the public interprets service outages. We describe the models, present experiments on the largest twitter DDoS corpus to date, and conclude with an analysis of public reactions based on the learned model’s output.

[1]  Dennis Kergl Enhancing Network Security by Software Vulnerability Detection Using Social Media Analysis Extended Abstract , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[2]  Graham Neubig,et al.  Safety Information Mining — What can NLP do in a disaster— , 2011, IJCNLP.

[3]  Robert Roedler,et al.  Detection of Zero Day Exploits Using Real-Time Social Media Streams , 2015, NaBIC.

[4]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[5]  Susan T. Dumais,et al.  Partially labeled topic models for interpretable text mining , 2011, KDD.

[6]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[7]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[8]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[11]  Xuan Zhang,et al.  Event extraction from Twitter using Non-Parametric Bayesian Mixture Model with Word Embeddings , 2017, EACL.

[12]  Regina Barzilay,et al.  Event Discovery in Social Media Feeds , 2011, ACL.

[13]  Stefan Savage,et al.  Measuring Online Service Availability Using Twitter , 2010, WOSN.

[14]  Yue Zhang,et al.  Expectation-Regulated Neural Model for Event Mention Extraction , 2016, NAACL.

[15]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[16]  Krishnaprasad Thirunarayan,et al.  Extracting City Traffic Events from Social Streams , 2015, ACM Trans. Intell. Syst. Technol..

[17]  Wei Wei,et al.  Probabilistic Models of Topics and Social Events , 2016 .

[18]  Tom M. Mitchell,et al.  Weakly Supervised Extraction of Computer Security Events from Twitter , 2015, WWW.

[19]  Heng Ji,et al.  Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media , 2013, ACL.

[20]  Ana-Maria Popescu,et al.  Extracting events and event descriptions from Twitter , 2011, WWW.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.