Panopticon: a scalable monitoring system

Monitoring systems are necessary for the management of anything beyond the smallest networks of computers. While specialised monitoring systems can be deployed to detect specific problems, more general systems are required to detect unexpected issues, and track performance trends. While large fleets of computers are becoming more common, few existing, general monitoring systems have the capability to scale to monitor these very large networks. There is also an absence of systems in the literature that cater for visualisation of monitoring information on a large scale. Scale is an issue in both the design and presentation of large-scale monitoring systems. We discuss Panopticon, a monitoring system that we have developed, which can scale to monitor tens of thousands of nodes, using only commodity equipment. In addition, we propose a novel method for visualising monitoring information on a large scale, based on general techniques for visualising massive multi-dimensional datasets. The monitoring system is shown to be able to collect information from up to 100 000 nodes. The storage system is able to record and output information from up to 25 000 nodes, and the visualisation is able to simultaneously display all this information for up to 20 000 nodes. Optimisations to our storage system could allow it to scale a little further, but a distributed storage approach combined with intelligent filtering algorithms would be necessary for significant improvements in scalability.

[1]  Bert J. Debusschere,et al.  Ovis-2: A robust distributed architecture for scalable RAS , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[2]  Daniel A. Keim,et al.  Designing Pixel-Oriented Visualization Techniques: Theory and Applications , 2000, IEEE Trans. Vis. Comput. Graph..

[3]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[4]  Daniel Asimov,et al.  The grand tour: a tool for viewing multidimensional data , 1985 .

[5]  Jade Goldstein-Stewart,et al.  Using aggregation and dynamic queries for exploring large data sets , 1994, CHI Conference Companion.

[6]  Ann C. Gentile,et al.  OVIS: a tool for intelligent, real-time monitoring of computational clusters , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[7]  Heidrun Schumann,et al.  Scalable Pixel-based Visual Interfaces: Challenges and Solutions , 2006, Tenth International Conference on Information Visualisation (IV'06).

[8]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[9]  Tobias Oetiker,et al.  MRTG: The Multi Router Traffic Grapher , 1998, LISA.

[10]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[11]  Pat Hanrahan,et al.  Polaris: a system for query, analysis and visualization of multi-dimensional relational databases , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[12]  Daniel A. Keim,et al.  An Automated Approach for the Optimization of Pixel-Based Visualizations , 2007, Inf. Vis..

[13]  Alfred Inselberg,et al.  Parallel coordinates for visualizing multi-dimensional geometry , 1987 .

[14]  Heidrun Schumann,et al.  A scalable framework for information visualization , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[15]  Douglas Crockford,et al.  The application/json Media Type for JavaScript Object Notation (JSON) , 2006, RFC.

[16]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[17]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[18]  Eric Anderson,et al.  Extensible, Scalable Monitoring for Clusters of Computers , 1997, LISA.

[19]  David E. Culler,et al.  Wide area cluster monitoring with Ganglia , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.