Blog or block: Detecting blog bots through behavioral biometrics

Blog bots are automated scripts or programs that post comments to blog sites, often including spam or other malicious links. An effective defense against the automatic form filling and posting from blog bots is to detect and validate the human presence. Conventional detection methods usually require direct participation of human users, such as recognizing a CAPTCHA image, which can be burdensome for users. In this paper, we present a new detection approach by using behavioral biometrics, primarily mouse and keystroke dynamics, to distinguish between human and bot. Based on passive monitoring, the proposed approach does not require any direct user participation. We collect real user input data from a very active online community and blog site, and use this data to characterize behavioral differences between human and bot. The most useful features for classification provide the basis for a detection system consisting of two main components: a webpage-embedded logger and a server-side classifier. The webpage-embedded logger records mouse movement and keystroke data while a user is filling out a form, and provides this data in batches to a server-side detector, which classifies the poster as human or bot. Our experimental results demonstrate an overall detection accuracy greater than 99%, with negligible overhead.

[1]  Marios D. Dikaiakos,et al.  Web robot detection: A probabilistic reasoning approach , 2009, Comput. Networks.

[2]  Fabian Monrose,et al.  Authentication via keystroke dynamics , 1997, CCS '97.

[3]  R. Quinlan,et al.  Decision tree discovery , 1999 .

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Giuseppe Di Battista,et al.  Computer Networks , 2013, Handbook of Graph Drawing and Visualization.

[6]  Jeff Yan,et al.  A low-cost attack on a Microsoft captcha , 2008, CCS.

[7]  Zhenyu Wu,et al.  Battle of Botcraft: fighting bots in online games with human observational proofs , 2009, CCS.

[8]  Steven Gianvecchio,et al.  Detecting covert timing channels: an entropy-based approach , 2007, CCS '07.

[9]  Sushil Jajodia,et al.  Who is tweeting on Twitter: human, bot, or cyborg? , 2010, ACSAC '10.

[10]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[11]  Ahmed Awad E. Ahmed,et al.  A New Biometric Technology Based on Mouse Dynamics , 2007, IEEE Transactions on Dependable and Secure Computing.

[12]  Mary Czerwinski,et al.  Designing human friendly human interaction proofs (HIPs) , 2005, CHI.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Colin J. Ihrig JavaScript Object Notation , 2013 .

[15]  Jee-Hyong Lee,et al.  Trackback-Rank: An Effective Ranking Algorithm for the Blog Search , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[16]  Ron Kohavi,et al.  Data mining tasks and methods: Classification: decision-tree discovery , 2002 .

[17]  Giuseppe Baselli,et al.  Measuring regularity by means of a corrected conditional entropy in sympathetic outflow , 1998, Biological Cybernetics.

[18]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[19]  Daniel P. Lopresti,et al.  Biometric Authentication Revisited: Understanding the Impact of Wolves in Sheep's Clothing , 2006, USENIX Security Symposium.

[20]  Václav Matyás,et al.  Toward Reliable User Authentication through Biometrics , 2003, IEEE Secur. Priv..

[21]  Marcus Brown,et al.  User Identification via Keystroke Characteristics of Typed Names using Neural Networks , 1993, Int. J. Man Mach. Stud..

[22]  Kang-Won Lee,et al.  Securing Web Service by Automatic Robot Detection , 2006, USENIX Annual Technical Conference, General Track.

[23]  Claudia Picardi,et al.  User authentication through keystroke dynamics , 2002, TSEC.

[24]  Dan Boneh,et al.  Protecting browser state from web privacy attacks , 2006, WWW '06.

[25]  Pamela L. Eddy COLLEGE ' OF WILLIAM AND MARY , 2004 .