Detecting Distributed Scans Using High-Performance Query-Driven Visualization

Modern forensic analytics applications, like network traffic analysis, perform high-performance hypothesis testing, knowledge discovery and data mining on very large datasets. One essential strategy to reduce the time required for these operations is to select only the most relevant data records for a given computation. In this paper, we present a set of parallel algorithms that demonstrate how an efficient selection mechanism - bitmap indexing - significantly speeds up a common analysis task, namely, computing conditional histogram on very large datasets. We present a thorough study of the performance characteristics of the parallel conditional histogram algorithms. As a case study, we compute conditional histograms for detecting distributed scans hidden in a dataset consisting of approximately 2.5 billion network connection records. We show that these conditional histograms can be computed on interactive time scale (i.e., in seconds). We also show how to progressively modify the selection criteria to narrow the analysis and find the sources of the distributed scans

[1]  Arie Shoshani,et al.  A performance comparison of bitmap indexes , 2001, CIKM '01.

[2]  Arie Shoshani,et al.  An efficient compression scheme for bitmap indices , 2004 .

[3]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[4]  Daniel A. Keim,et al.  Visualizing large-scale telecommunication networks and services , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[5]  Kenneth C. Cox,et al.  3D geographic network displays , 1996, SGMD.

[6]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[7]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[8]  Benjamin D. Uphoff A Framework for Collection and Management of Intrusion Detection Data Sets , 2004 .

[9]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[10]  Theodore Johnson,et al.  Performance Measurements of Compressed Bitmap Indices , 1999, VLDB.

[11]  Arie Shoshani,et al.  Multidimensional indexing and query coordination for tertiary storage management , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[12]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[13]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[14]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[15]  Carrie Gates,et al.  More Netflow Tools for Performance and Security , 2004, LISA.

[16]  Robert Steele,et al.  Multi router traffic grapher (MRTG) for body area network (BAN) surveillance , 2004 .

[17]  Tobias Oetiker Multi Router Traffic Grapher , 1998 .

[18]  William Yurcik,et al.  NVisionIP: netflow visualizations of system state for security situational awareness , 2004, VizSEC/DMSEC '04.

[19]  Bernd Hamann,et al.  The asymptotic decider: resolving the ambiguity in marching cubes , 1991, Proceeding Visualization '91.

[20]  Stephen E. Bensley,et al.  Cabletron's Light-weight Flow Admission Protocol Specification Version 1.0 , 1997, RFC.

[21]  John Shalf,et al.  Query-driven visualization of large data sets , 2005, VIS 05. IEEE Visualization, 2005..

[22]  Kesheng Wu,et al.  Accelerating Network Traffic Analytics Using Query-Driven Visualization , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[23]  Yifan Li,et al.  VisFlowConnect: netflow visualizations of link relationships for security situational awareness , 2004, VizSEC/DMSEC '04.

[24]  Wayne G. Lutters,et al.  Preserving the big picture: visual network traffic analysis with TNV , 2005, IEEE Workshop on Visualization for Computer Security, 2005. (VizSEC 05)..

[25]  James P. Ahrens,et al.  Scout: a hardware-accelerated system for quantitatively driven visualization and analysis , 2004, IEEE Visualization 2004.

[26]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[27]  Donald Ervin Knuth,et al.  The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information , 1978 .

[28]  Steve Romig,et al.  The OSU Flow-tools Package and CISCO NetFlow Logs , 2000, LISA.

[29]  Marc Levoy,et al.  Display of surfaces from volume data , 1988, IEEE Computer Graphics and Applications.

[30]  David Plonka,et al.  FlowScan: A Network Traffic Flow Reporting and Visualization Tool , 2000, LISA.

[31]  G. Kindlmann,et al.  Semi-automatic generation of transfer functions for direct volume rendering , 1998, IEEE Symposium on Volume Visualization (Cat. No.989EX300).

[32]  Kwan-Liu Ma,et al.  PortVis: a tool for port-based detection of security events , 2004, VizSEC/DMSEC '04.

[33]  Stephen Lau,et al.  The Spinning Cube of Potential Doom , 2004, CACM.

[34]  George Varghese,et al.  Agile and scalable analysis of network events , 2002, IMW '02.

[35]  Anja Feldmann,et al.  Building a time machine for efficient recording and retrieval of high-volume network traffic , 2005, IMC '05.

[36]  Kesheng Wu,et al.  Bitmap Indices for Fast End-User Physics Analysis in ROOT , 2006 .

[37]  Yannis E. Ioannidis,et al.  The History of Histograms (abridged) , 2003, VLDB.

[38]  William J. Schroeder,et al.  The visualization toolkit user's guide : updated for version 4.0 , 1998 .

[39]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[40]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[41]  Nelson L. Max,et al.  Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[42]  Arie Shoshani,et al.  An Ecien t Compression Scheme For Bitmap Indices , 2006 .

[43]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[44]  Ben Shneiderman,et al.  Interactive Exploration of Time Series Data , 2001, Discovery Science.

[45]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[46]  Ben Shneiderman,et al.  Visual Specification of Queries for Finding Patterns in Time-Series Data (2001) , 2005 .

[47]  Daniel A. Keim,et al.  Visualizing large-scale telecommunication networks and services (case study) , 1999, VIS '99.

[48]  Peter Phaal,et al.  InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks , 2001, RFC.

[49]  Thomas P. Caudell,et al.  Immersive Network Monitoring , 2003 .

[50]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[51]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[52]  Kesheng Wu,et al.  Network Traffic Analysis With Query Driven Visualization SC 2005 HPC Analytics Results , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[53]  John R. Goodall,et al.  A user-centered look at glyph-based security visualization , 2005, IEEE Workshop on Visualization for Computer Security, 2005. (VizSEC 05)..

[54]  Hans-Peter Kriegel,et al.  VisDB: database exploration using multidimensional visualization , 1994, IEEE Computer Graphics and Applications.

[55]  Fons Rademakers,et al.  ROOT — An object oriented data analysis framework , 1997 .

[56]  Stefan Berchtold,et al.  Independence Diagrams: A Technique for Visual Data Mining , 1998, KDD.

[57]  Gordon L. Kindlmann,et al.  Semi-Automatic Generation of Transfer Functions for Direct Volume Rendering , 1998, VVS.

[58]  Yarden Livnat,et al.  A visualization paradigm for network intrusion detection , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[59]  Erich Schikuta,et al.  Improving the Performance of High-Energy Physics Analysis through Bitmap Indices , 2000, DEXA.