Learning behavioral fingerprints from Netflows using Timed Automata

We present a novel way to detect infected hosts and identify malware in networks by analyzing network communication statistics with state-of-the-art automata learning algorithms. The automata encode patterns of short-term interactions in known malicious hosts, and are used to obtain small but effective fingerprints of machine behavior. We showcase the effectiveness of our system, named BASTA1 (Behavioral Analytics System using Timed Automata), on a public dataset containing Netflow traces of real-world botnet malware. Compared to a deep packet inspection of communication content, Netflows are easy and cheap to collect and analyze, and preserve a greater degree of privacy. Even though the high level of abstraction in Netflow data makes it more difficult to utilize it, BASTA shows very impressive results achieving high accuracy in several settings while returning few false positives. It is also capable of detecting infections of previously unseen malware.

[1]  Herbert Bos,et al.  SoK: P2PWNED - Modeling and Evaluating the Resilience of Peer-to-Peer Botnets , 2013, 2013 IEEE Symposium on Security and Privacy.

[2]  Chadi Barakat,et al.  Can We Trust the Inter-Packet Time for Traffic Classification? , 2011, 2011 IEEE International Conference on Communications (ICC).

[3]  W. Timothy Strayer,et al.  Botnet Detection Based on Network Behavior , 2008, Botnet Detection.

[4]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[5]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[6]  Catalin Dima,et al.  Real-Time Automata , 2001, J. Autom. Lang. Comb..

[7]  DANA ANGLUIN,et al.  On the Complexity of Minimum Inference of Regular Sets , 1978, Inf. Control..

[8]  Xu Chen,et al.  Towards an understanding of anti-virtualization and anti-debugging behavior in modern malware , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[9]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[10]  Leyla Bilge,et al.  Disclosure: detecting botnet command and control servers through large-scale NetFlow analysis , 2012, ACSAC '12.

[11]  Heejo Lee,et al.  Identifying botnets by capturing group activities in DNS traffic , 2012, Comput. Networks.

[12]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[13]  Oksana Pomorova,et al.  A Technique for the Botnet Detection Based on DNS-Traffic Analysis , 2015, CN.

[14]  Alejandro Zunino,et al.  An empirical comparison of botnet detection methods , 2014, Comput. Secur..

[15]  Yan Chen,et al.  Botnet Research Survey , 2008, 2008 32nd Annual IEEE International Computer Software and Applications Conference.

[16]  Sicco Verwer Efficient Identification of Timed Automata: Theory and practice , 2010 .

[17]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[18]  Dawn Xiaodong Song,et al.  Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering , 2009, CCS.

[19]  Sergio Takeo Kofuji,et al.  Viterbi algorithm for detecting DDoS attacks , 2015, 2015 IEEE 40th Conference on Local Computer Networks (LCN).

[20]  Dawn Xiaodong Song,et al.  Recognizing malicious software behaviors with tree automata inference , 2012, Formal Methods Syst. Des..

[21]  Yong Tang,et al.  Defending against Internet worms: a signature-based approach , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[22]  Cees Witteveen,et al.  A Likelihood-Ratio Test for Identifying Probabilistic Deterministic Real-Time Automata from Positive Data , 2010, ICGI.

[23]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[24]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[25]  Dawn Xiaodong Song,et al.  Inference and analysis of formal models of botnet command and control protocols , 2010, CCS '10.

[26]  Nicole Krämer,et al.  Learning stateful models for network honeypots , 2012, AISec.

[27]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[28]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[29]  David Lee,et al.  Principles and methods of testing finite state machines-a survey , 1996, Proc. IEEE.

[30]  George Bebis,et al.  A survey of network flow applications , 2013, J. Netw. Comput. Appl..

[31]  Eric Filiol,et al.  Behavioral detection of malware: from a survey towards an established taxonomy , 2008, Journal in Computer Virology.

[32]  Sureswaran Ramadass,et al.  A Survey of Botnet and Botnet Detection , 2009, 2009 Third International Conference on Emerging Security Information, Systems and Technologies.

[33]  Marc Dacier,et al.  ScriptGen: an automated script generation tool for Honeyd , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[34]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[35]  Suresh Singh,et al.  An Algorithm for Anomaly-based Botnet Detection , 2006, SRUTI.

[36]  Khaled Yakdan,et al.  BotWatcher - Transparent and Generic Botnet Tracking , 2015, RAID.

[37]  Li Guo,et al.  Inferring Protocol State Machine from Network Traces: A Probabilistic Approach , 2011, ACNS.