Building Reference Datasets to Support Socialbots Detection

Reference datasets are databases that guide the development and the evaluation of tools and methods in several areas of Computer Science. In the field of Information Security, in particular, there is a notable need for tools devoted to detection and classification. Hence, the availability of such datasets is fundamental: the reference dataset is seen as a standard against which a tool must be tested characterize its accuracy. In this sense, the reference dataset is analogous to the classic metrology primary standard, in a sense that it provides the most trustworthy reference against which an object under evaluation can be compared to. It is therefore of great importance to devote efforts to the development of methods that assure the quality of reference datasets. In the present work, we discuss the challenges faced by the currently available datasets and propose directions towards the development of reliable datasets. Finally, we propose a methodology for the construction of reference datasets for Online Social Networks and present a case study for the construction of a Twitter dataset for the detection of social bots.

[1]  Sushil Jajodia,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[4]  Lotfi A. Zadeh,et al.  Fuzzy Logic , 2009, Encyclopedia of Complexity and Systems Science.

[5]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[6]  AbdulMalik S. Al-Salman,et al.  Twitter turing test: Identifying social machines , 2016, Inf. Sci..

[7]  Rishabh Kaushal,et al.  SocialBot: Behavioral Analysis and Detection , 2016, SSCC.

[8]  Jon Crowcroft,et al.  Of Bots and Humans (on Twitter) , 2017, ASONAM.

[9]  Roberto Di Pietro,et al.  A Criticism to Society (As Seen by Twitter Analytics) , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[10]  Raphael Machado,et al.  Sistemas fuzzy complementam a detecção de socialbots por aprendizado de máquina , 2019, Ciência da computação: princípios fundamentais.

[11]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[12]  Roberto Di Pietro,et al.  Fame for sale: Efficient detection of fake Twitter followers , 2015, Decis. Support Syst..

[13]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[14]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[15]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[16]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[17]  Ee-Peng Lim,et al.  On Profiling Bots in Social Media , 2016, SocInfo.