Visual analytics of anomaly detection in large data streams

Most data streams usually are multi-dimensional, high-speed, and contain massive volumes of continuous information. They are seen in daily applications, such as telephone calls, retail sales, data center performance, and oil production operations. Many analysts want insight into the behavior of this data. They want to catch the exceptions in flight to reveal the causes of the anomalies and to take immediate action. To guide the user in finding the anomalies in the large data stream quickly, we derive a new automated neighborhood threshold marking technique, called AnomalyMarker. This technique is built on cell-based data streams and user-defined thresholds. We extend the scope of the data points around the threshold to include the surrounding areas. The idea is to define a focus area (marked area) which enables users to (1) visually group the interesting data points related to the anomalies (i.e., problems that occur persistently or occasionally) for observing their behavior; (2) discover the factors related to the anomaly by visualizing the correlations between the problem attribute with the attributes of the nearby data items from the entire multi-dimensional data stream. Mining results are quickly presented in graphical representations (i.e., tooltip) for the user to zoom into the problem regions. Different algorithms are introduced which try to optimize the size and extent of the anomaly markers. We have successfully applied this technique to detect data stream anomalies in large real-world enterprise server performance and data center energy management.

[1]  Daniel A. Keim,et al.  Density Displays for Data Stream Monitoring , 2008, Comput. Graph. Forum.

[2]  Ben Shneiderman,et al.  Dynamic Query Tools for Time Series Data Sets: Timebox Widgets for Interactive Exploration , 2004, Inf. Vis..

[3]  Daniel A. Keim,et al.  Visual analytics techniques for large multi-attribute time series data , 2008, Electronic Imaging.

[4]  A. Karr,et al.  Visual Scalability , 2002 .

[5]  Stephen G. Eick,et al.  Visual Discovery and Analysis , 2000, IEEE Trans. Vis. Comput. Graph..

[6]  Tamara Munzner,et al.  Overview Use in Multiple Visual Information Resolution Interfaces , 2007, IEEE Transactions on Visualization and Computer Graphics.

[7]  Wolfgang Jank,et al.  Similarity-Based Forecasting with Simultaneous Previews: A River Plot Interface for Time Series Forecasting , 2007, 2007 11th International Conference Information Visualization (IV '07).

[8]  Hans-Peter Kriegel,et al.  Recursive pattern: a technique for visualizing very large amounts of data , 1995, Proceedings Visualization '95.

[9]  Daniel A. Keim,et al.  Intelligent Visual Analytics Queries , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[10]  Stephen G. Eick,et al.  Seesoft-A Tool For Visualizing Line Oriented Software Statistics , 1992, IEEE Trans. Software Eng..

[11]  Ben Shneiderman,et al.  Interactive pattern search in time series , 2005, IS&T/SPIE Electronic Imaging.

[12]  Daniel A. Keim,et al.  Multi-Resolution Techniques for Visual Exploration of Large Time-Series Data , 2007, EuroVis.

[13]  Chris North,et al.  The Perceptual Scalability of Visualization , 2006, IEEE Transactions on Visualization and Computer Graphics.

[14]  Ben Shneiderman,et al.  Dynamic query tools for time series data sets: timebox widgets for interactive exploration , 2004 .