Disassortativity of computer networks

Network data is ubiquitous in cyber-security applications. Accurately modelling such data allows discovery of anomalous edges, subgraphs or paths, and is key to many signature-free cyber-security analytics. We present a recurring property of graphs originating from cyber-security applications, often considered a `corner case' in the main literature on network data analysis, that greatly affects the performance of standard `off-the-shelf' techniques. This is the property that similarity, in terms of network behaviour, does not imply connectivity, and in fact the reverse is often true. We call this disassortivity. The phenomenon is illustrated using network flow data collected on an enterprise network, and we show how Big Data analytics designed to detect unusual connectivity patterns can be improved.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Patrick J. Wolfe,et al.  Network histograms and universality of blockmodel approximation , 2013, Proceedings of the National Academy of Sciences.

[3]  Lawrence B. Holder,et al.  Insider Threat Detection Using a Graph-Based Approach , 2010 .

[4]  Susan Khor,et al.  Concurrency and Network Disassortativity , 2010, Artificial Life.

[5]  Curtis B. Storlie,et al.  Scan Statistics for the Online Detection of Locally Anomalous Subgraphs , 2013, Technometrics.

[6]  Hsinchun Chen,et al.  Identifying Top Sellers In Underground Economy Using Deep Learning-Based Sentiment Analysis , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[7]  Harrison H. Zhou,et al.  Rate-optimal graphon estimation , 2014, 1410.5837.

[8]  Curtis Hash,et al.  Towards improved detection of attackers in computer networks: New edges, fast updating, and host agents , 2013, 2013 6th International Symposium on Resilient Control Systems (ISRCS).

[9]  Silvia Metelli,et al.  Modelling New Edge Formation in a Computer Network through Bayesian Variable Selection , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[10]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[11]  Matthew Morgan,et al.  Network attacks and the data they affect , 2016 .

[12]  Alexander D. Kent,et al.  Comprehensive, Multi-Source Cyber-Security Events Data Set , 2015 .

[13]  Matei Zaharia,et al.  Matrix Computations and Optimization in Apache Spark , 2015, KDD.

[14]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M. Brand,et al.  Fast low-rank modifications of the thin singular value decomposition , 2006 .

[16]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[17]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[18]  Alexander D. Kent,et al.  Cyber security data sources for dynamic network research , 2016 .

[19]  Lorie M. Liebrock,et al.  Authentication graphs: Analyzing user behavior within an enterprise network , 2015, Comput. Secur..

[20]  Robin Kirschbaum,et al.  Questions and answers , 2009, Diabetes, obesity & metabolism.

[21]  Russell Bent,et al.  A likelihood ratio anomaly detector for identifying within-perimeter computer network attacks , 2016, J. Netw. Comput. Appl..

[22]  Michael K. Reiter,et al.  Hit-List Worm Detection and Bot Identification in Large Networks Using Protocol Graphs , 2007, RAID.

[23]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[24]  Edoardo M. Airoldi,et al.  Stochastic blockmodel approximation of a graphon: Theory and consistent estimation , 2013, NIPS.

[25]  Niall M. Adams,et al.  Three Statistical Approaches to Sessionizing Network Flow Data , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.