Services Supporting Management of Distributed Applications and Systems

A distributed computing system consists of heterogeneous computing devices, communication networks, operating system services, and applications. As organisations move toward distributed computing environments, there will be a corresponding growth in distributed applications central to the enterprise. The design, development, and management of distributed applications presents many difficult challenges. As these systems grow to hundreds or even thousands of devices and similar or greater magnitude of software components, it will become increasingly difficult to manage them without appropriate support tools and frameworks. Further, the design and deployment of additional applications and services will be, at best, ad hoc without modelling tools and timely data on which to base design and configuration decisions. This paper presents a framework for management of distributed applications and systems. The framework is based on a set of common management services that support management activities. The services include monitoring, control, configuration, and data repository services. A prototype system built on the framework is described that implements and integrates management applications providing visualisation, fault location, performance monitoring and modelling, and configuration management. The prototype also demonstrates how various management services can be implemented.

[1]  Flaviu Cristian,et al.  Automatic Reconnguration in the Presence of Failures , 1992 .

[2]  Jennifer Widom,et al.  A System Prototype for Warehouse View Maintenance , 1996, VIEWS.

[3]  Kenneth P. Birman,et al.  The ISIS project: real experience with a fault tolerant programming system , 1990, EW 4.

[4]  James Won-Ki Hong,et al.  A Distributed System Architecture for a Distributed Application Environment , 1994, IBM Syst. J..

[5]  Bojana Obrenic,et al.  DCE Cells under Megascope: Pilgrim Insight into the Resource Status , 1993, DCE Workshop.

[6]  Greg Hills,et al.  Ensuring responsiveness and scalability for distributed applications , 1995, CASCON.

[7]  James M. Purtilo,et al.  Surgeon: a packager for dynamically reconfigurable distributed applications , 1992, Softw. Eng. J..

[8]  Keith McCloghrie,et al.  Structure of Management Information for version 2 of the Simple Network Management Protocol (SNMPv2) , 1993, RFC.

[9]  Ying Sun,et al.  Measuring RPC traffic in an OS/2 DCE environment , 1997, Proceedings Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[10]  James Won-Ki Hong,et al.  The abstraction and modelling of management agents , 1995, Integrated Network Management.

[11]  Patrick Martin A management information repository for distributed applications management , 1996, Proceedings of 1996 International Conference on Parallel and Distributed Systems.

[12]  Kerry Raymond,et al.  Reference Model of Open Distributed Processing: a Tutorial , 1993, Open Distributed Processing.

[13]  Ying Sun Measuring and Modelling RPC Performance in OSF DCE , 1997 .

[14]  Jerome A. Rolia,et al.  The Method of Layers , 1995, IEEE Trans. Software Eng..

[15]  Yennun Huang,et al.  Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.

[16]  M. Kaiserswerth,et al.  Object instrumentation for distributed applications management , 1996, Proceedings of IFIP/IEEE International Conference on Distributed Platforms.

[17]  David J. Taylor The use of process clustering in distributed-system event displays , 1993, CASCON.

[18]  Michael Anthony Bauer,et al.  Policy-driven fault management in distributed systems , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[19]  Keith McCloghrie,et al.  Structure of Management Information for version 2 of the Simple Network Management Protocol (SNMPv2) , 1993, RFC.

[20]  Shikharesh Majumdar,et al.  The Stochastic Rendezvous Network Model for Performance of Synchronous Client-Server-like Distributed Software , 1995, IEEE Trans. Computers.

[21]  Thomas Kunz Reverse Engineering Distributed Applications: an Event Abstraction Tool , 1994, Int. J. Softw. Eng. Knowl. Eng..

[22]  Thomas Kunz,et al.  A tool for debugging OSF DCE applications , 1996, Proceedings of 20th International Computer Software and Applications Conference: COMPSAC '96.

[23]  Jerome A. Rolia,et al.  Parameter estimation for performance models of distributed application systems , 1995, CASCON.

[24]  James Won-Ki Hong,et al.  MANDAS: management of distributed applications and systems , 1995, Proceedings of the Fifth IEEE Computer Society Workshop on Future Trends of Distributed Computing Systems.

[25]  James Won-Ki Hong,et al.  Reference Architecture for Distributed Systems Management , 1994, IBM Syst. J..

[26]  Flaviu Cristian Automatic reconfiguration in the presence of failures , 1992, Softw. Eng. J..

[27]  Cheryl Krupczak,et al.  Definitions of System-Level Managed Objects for Applications , 1998, RFC.

[28]  Kenneth P. Birman,et al.  Tools for distributed application management , 1991, Computer.

[29]  Thomas Kunz,et al.  Using Automatic Process Clustering for Design Recovery and Distributed Debugging , 1995, IEEE Trans. Software Eng..

[30]  Hanan Lutfiyya,et al.  Efficient management data acquisition and run-time control of DCE applications using the OSI management framework , 1996, Proceedings of IEEE International Workshop on System Management.

[31]  Thomas Kunz Visualizing abstract events , 1994, CASCON.

[32]  Jeffrey D. Case,et al.  Simple Network Management Protocol (SNMP) , 1989, RFC.

[33]  Jerome A. Rolia,et al.  Automatic generation of performance models for distributed application systems , 1996, CASCON.

[34]  Keith McCloghrie,et al.  Structure and identification of management information for TCP/IP-based internets , 1988, RFC.