TOSS: TONICS for operation support systems: system management using the world wide web and intelligent software agents

An enterprise wide distributed computing environment consists of a variety of hardware/OS platforms running a wide range of mission critical applications. These platforms may be interconnected over local area networks, wide area networks, even the public Internet. The availability of the applications, platforms and associated computing resources (e.g. cpu, disk space, memory, databases, middlewares, etc.) is critical to the business mission of the enterprise. This poses demanding requirements on the design of system management procedures for such critical distributed systems. Those requirements include continuous monitoring of the health of these resources, detection of potential problems, problem notifications, and timely corrective actions. The traditional approach to meet these requirements has been mostly manual intervention from a system administrator who logs on to individual machines and issues a set of commands. However, this practice neither provides continuous monitoring, nor does it assure timely detections, notifications, or corrective actions. Moreover, it is time consuming, error prone, and insecure. We present a novel approach towards building a system management framework for mission critical distributed systems. Our framework, TOSS, offers several advantages over the existing approaches such as: Continuous monitoring of critical computing resources by intelligent software agents; Problem detection and notification by intelligent software agents; Centralized control of applications through a Web browser; Centralized configuration management through a Web browser; Security; Flexibility to monitor wide range of platforms and applications; Interoperability with third party management platforms.