IP2Vec: Learning Similarities Between IP Addresses

IP Addresses are a central part of packet- and flow-based network data. However, visualization and similarity computation of IP Addresses are challenging to due the missing natural order. This paper presents a novel similarity measure IP2Vec for IP Addresses that builds on ideas from Word2Vec, a popular approach in text mining. The key idea is to learn similarities by extracting available context information from network data. IP Addresses are similar if they appear in similar contexts. Thus, IP2Vec is automatically derived from the given network data set. The proposed approach is evaluated experimentally on two public flow-based data sets. In particular, we demonstrate the effectiveness of clustering IP Addresses within a botnet data set. In addition, we use visualization methods to analyse the learned similarities in more detail. These experiments indicate that IP2Vec is well suited to capture the similarity of IP Addresses based on their network communications.

[1]  Jiankun Hu,et al.  A Real-Time NetFlow-based Intrusion Detection System with Improved BBNN and High-Frequency Field Programmable Gate Arrays , 2012, 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications.

[2]  Taghi M. Khoshgoftaar,et al.  Detection of SSH Brute Force Attacks Using Aggregated Netflow Data , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[3]  Lalu Banoth,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2017 .

[4]  Mohamed Idhammad LabSIV DoS Detection Method based on Artificial Neural Networks , 2017 .

[5]  Scott E. Coull,et al.  On Measuring the Similarity of Network Hosts: Pitfalls, New Metrics, and Empirical Analyses , 2011, NDSS.

[6]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[7]  Kristián Valentín,et al.  Network Firewall using Artificial Neural Networks , 2013, Comput. Informatics.

[8]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[9]  Jens Myrup Pedersen,et al.  An efficient flow-based botnet detection using supervised machine learning , 2014, 2014 International Conference on Computing, Networking and Communications (ICNC).

[10]  Hao Jiang,et al.  IP geolocation estimation using neural networks with stable landmarks , 2016, 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[11]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[12]  John S. Heidemann,et al.  Towards geolocation of millions of IP addresses , 2012, IMC '12.

[13]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[14]  George Bebis,et al.  A supervised machine learning approach to classify host roles on line using sFlow , 2013, HPPN '13.

[15]  Brett J. Borghetti,et al.  A Survey of Distance and Similarity Measures Used Within Network Intrusion Anomaly Detection , 2015, IEEE Communications Surveys & Tutorials.

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[18]  Alejandro Zunino,et al.  An empirical comparison of botnet detection methods , 2014, Comput. Secur..

[19]  Philipp Winter,et al.  Inductive Intrusion Detection in Flow-Based Network Data Using One-Class Support Vector Machines , 2011, 2011 4th IFIP International Conference on New Technologies, Mobility and Security.

[20]  Taghi M. Khoshgoftaar,et al.  RUDY Attack: Detection at the Network Level and Its Important Features , 2016, FLAIRS.

[21]  Andreas Hotho,et al.  Flow-based benchmark data sets for intrusion detection , 2017 .

[22]  Hong Jia,et al.  A New Distance Metric for Unsupervised Learning of Categorical Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Andreas Hotho,et al.  A Toolset for Intrusion and Insider Threat Detection , 2017 .

[24]  Leyla Bilge,et al.  Disclosure: detecting botnet command and control servers through large-scale NetFlow analysis , 2012, ACSAC '12.

[25]  Ali A. Ghorbani,et al.  Towards effective feature selection in machine learning-based botnet detection approaches , 2014, 2014 IEEE Conference on Communications and Network Security.

[26]  Ruggero G. Pensa,et al.  Context-Based Distance Learning for Categorical Data Clustering , 2009, IDA.

[27]  Ahmad Jakalan,et al.  Social relationship discovery of IP addresses in the managed IP networks by observing traffic at network boundary , 2016, Comput. Networks.

[28]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[29]  Andreas Hotho,et al.  ConDist: A Context-Driven Categorical Distance Measure , 2015, ECML/PKDD.

[30]  Jing Wang,et al.  Botnet Detection Based on Anomaly and Community Detection , 2017, IEEE Transactions on Control of Network Systems.

[31]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[34]  Baris Coskun,et al.  Detecting hidden enemy lines in IP address space , 2013, NSPW '13.

[35]  Andreas Hotho,et al.  Automatic Threshold Calculation for the Categorical Distance Measure ConDist , 2015, LWA.