Design and Implementation of a Scalable Network Monitoring System

Abstract Monitoring systems give network administrators a better view andunderstanding of their networks. Amongst their many uses, they canbe used to audit computing assets, profile resource usage, and pinpointsecurity problems.Current monitoring systems have not really explored the limits ofmonitoring scalability, preferring to focus on other important issues suchas reliability and node discovery.We present a monitoring system that scale to over 100000 nodes. Ithas minimal local and global overhead, and maintains integrity in theface of transient network failure. Through a hierarchal organisation, ourmonitoring system can operate in multiple administrative zones.Since we did not have a large fleet of machines at our disposal, wesimulated a large fleet of machines, upon which a smaller network ofreal machines was overlayed. This system was the testbed for our scal-ability evaluations.In addition we include a web service interface, which allows accessto our system via HTTP. This frees consumers from the need to imple-ment special clients for interfacing with our system. InThis work is part of a larger project, Panopticon, which is a completemonitoring solution, including a database tier and visualisation client.