Probabilistic Inference on Twitter Data to Discover Suspicious Users and Malicious Content

While the power of social media on the Internet is undeniable, it has become a major weapon for launching cyberattacks against an organization and its people. Today, there is a growing number of cyberattacks being launched through social media such as posting of false content from hacked accounts, posting malicious URLs to spread malware, and others. In this paper, we present a simple and flexible unified framework called SocialKB for modeling social media posts and reasoning about them to ascertain their veracity, a first step towards discovering emerging cyber threats. SocialKB is based on Markov Logic Networks (MLNs), a popular representation in statistical relational learning. It learns a knowledge base (KB) on the social media posts and users' behavior in a unified manner. By conducting probabilistic inference on the KB, SocialKB can identify suspicious users and malicious content. In this work, we specifically focus on tweets posted by users on Twitter. Finally, we report an evaluation of SocialKB on 20,000 tweets and discuss our early inference results.

[1]  Somdeb Sarkhel,et al.  Fast Lifted MAP Inference via Partitioning , 2015, NIPS.

[2]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Guanhua Yan,et al.  Malware propagation in online social networks: nature, dynamics, and defense implications , 2011, ASIACCS '11.

[4]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[5]  Omer F. Rana,et al.  Real-time classification of malicious URLs on Twitter using machine activity data , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[6]  Kalina Bontcheva,et al.  Classifying Tweet Level Judgements of Rumours in Social Media , 2015, EMNLP.

[7]  Calton Pu,et al.  Click traffic analysis of short URL spam on Twitter , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[8]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[9]  David M. Nicol,et al.  The Koobface botnet and the rise of social malware , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[10]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[11]  Dan Suciu,et al.  Lifted Inference Seen from the Other Side : The Tractable Features , 2010, NIPS.

[12]  Sven G. Bilen,et al.  Increasing the veracity of event detection on social media networks through user trust modeling , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[13]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[14]  Daisy Zhe Wang,et al.  Knowledge expansion over probabilistic knowledge bases , 2014, SIGMOD Conference.

[15]  William Yang Wang,et al.  Structure Learning via Parameter Learning , 2014, CIKM.

[16]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[17]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[18]  Jiebo Luo,et al.  SocialSpamGuard: A Data Mining-Based Spam Detection System for Social Media Networks , 2011, Proc. VLDB Endow..

[19]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[20]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[21]  Andrew McCallum,et al.  Conditional Models of Identity Uncertainty with Application to Noun Coreference , 2004, NIPS.

[22]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[23]  Pedro M. Domingos,et al.  Efficient Weight Learning for Markov Logic Networks , 2007, PKDD.

[24]  Jong Kim,et al.  WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream , 2013, IEEE Transactions on Dependable and Secure Computing.

[25]  Shambhu J. Upadhyaya,et al.  Analysis of Malware Propagation in Twitter , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[26]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[27]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[28]  Daisy Zhe Wang,et al.  Ontological Pathfinding , 2016, SIGMOD Conference.

[29]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[30]  Pedro M. Domingos,et al.  Discriminative Training of Markov Logic Networks , 2005, AAAI.