Humans and Bots in Internet Chat: Measurement, Analysis, and Automated Classification

The abuse of chat services by automated programs, known as chat bots, poses a serious threat to Internet users. Chat bots target popular chat networks to distribute spam and malware. In this paper, we first conduct a series of measurements on a large commercial chat network. Our measurements capture a total of 16 different types of chat bots ranging from simple to advanced. Moreover, we observe that human behavior is more complex than bot behavior. Based on the measurement study, we propose a classification system to accurately distinguish chat bots from human users. The proposed classification system consists of two components: 1) an entropy-based classifier; and 2) a Bayesian-based classifier. The two classifiers complement each other in chat bot detection. The entropy-based classifier is more accurate to detect unknown chat bots, whereas the Bayesian-based classifier is faster to detect known chat bots. Our experimental evaluation shows that the proposed classification system is highly effective in differentiating bots from humans.

[1]  Vinod Yegneswaran,et al.  BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation , 2007, USENIX Security Symposium.

[2]  Lance Chun Che Fung,et al.  Devious Chatbots - Interactive Malware with a Plot , 2009, FIRA.

[3]  Christopher Krügel,et al.  Exploiting Redundancy in Natural Language to Penetrate Bayesian Spam Filters , 2007, WOOT.

[4]  Na Li,et al.  Detecting and filtering instant messaging spam - a global and personalized approach , 2005, 1st IEEE ICNP Workshop on Secure Network Protocols, 2005. (NPSec)..

[5]  David Josephsen,et al.  Awarded Best Paper! - Scalable Centralized Bayesian Spam Mitigation with Bogofilter , 2004 .

[6]  Yan Zhou,et al.  Adaptive Spam Filtering Using Dynamic Feature Spaces , 2007, Int. J. Artif. Intell. Tools.

[7]  Prateek Mittal,et al.  BotGrep: Finding P2P Bots with Structured Graph Analysis , 2010, USENIX Security Symposium.

[8]  Jonathan A. Zdziarski,et al.  Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification , 2005 .

[9]  Guofei Gu,et al.  A Taxonomy of Botnet Structures , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[10]  Anja Feldmann,et al.  An analysis of Internet chat systems , 2003, IMC '03.

[11]  Tobias Lauinger,et al.  Honeybot, Your Man in the Middle for Automated Social Engineering , 2010, LEET.

[12]  Matteo Turilli,et al.  Turing’s Imitation Game: Still an Impossible Challenge for All Machines and Some Judges––An Evaluation of the 2008 Loebner Contest , 2008, Minds and Machines.

[13]  William S. Yerazunis Sparse Binary Polynomial Hashing and the CRM114 Discriminator , 2006 .

[14]  Guofei Gu,et al.  BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic , 2008, NDSS.

[15]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[16]  Yongdae Kim,et al.  Towards complete node enumeration in a peer-to-peer botnet , 2009, ASIACCS '09.

[17]  Kang Li,et al.  Fast statistical spam filter by approximate classifications , 2006, SIGMETRICS '06/Performance '06.

[18]  John Langford,et al.  CAPTCHA: Using Hard AI Problems for Security , 2003, EUROCRYPT.

[19]  Steven Gianvecchio,et al.  Detecting covert timing channels: an entropy-based approach , 2007, CCS '07.

[20]  Vita Hinze-Hoare,et al.  Should Cyberspace Chat Rooms be closed to protect Children? , 2004, ArXiv.

[21]  Shyhtsun Felix Wu,et al.  On Attacking Statistical Spam Filters , 2004, CEAS.

[22]  Giuseppe Baselli,et al.  Measuring regularity by means of a corrected conditional entropy in sympathetic outflow , 1998, Biological Cybernetics.

[23]  David Josephsen,et al.  Scalable Centralized Bayesian Spam Mitigation with Bogofilter (Awarded Best Paper!) , 2004, LISA.

[24]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[25]  Christopher Meek,et al.  Good Word Attacks on Statistical Spam Filters , 2005, CEAS.

[26]  Jeremy Blosser,et al.  Scalable Centralized Bayesian Spam Mitigation with Bogofilter , 2004 .

[27]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[28]  Yan Zhou,et al.  Adaptive spam filtering using dynamic feature space , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[29]  Steven Gianvecchio,et al.  An Entropy-Based Approach to Detecting Covert Timing Channels , 2011, IEEE Transactions on Dependable and Secure Computing.

[30]  Charlie Miller,et al.  Reducing the Attack Surface in Massively Multiplayer Online Role-Playing Games , 2009, IEEE Security & Privacy.

[31]  Zhenyu Wu,et al.  HoneyIM: Fast Detection and Suppression of Instant Messaging Malware in Enterprise-Like Networks , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[32]  Sven Krasser,et al.  Analyzing Network and Content Characteristics of Spim Using Honeypots , 2007, SRUTI.

[33]  Paul C. van Oorschot,et al.  On instant messaging worms, analysis and countermeasures , 2005, WORM '05.

[34]  Thorsten Holz,et al.  Rishi: Identify Bot Contaminated Hosts by IRC Nickname Evaluation , 2007, HotBots.

[35]  Erich M. Nahum,et al.  A study of Internet instant messaging and chat protocols , 2006, IEEE Network.