Making distributed applications manageable through instrumentation

The goal of a management system in a distributed computing environment is to provide a centralized and coordinated view of an otherwise distributed and heterogeneous collection of hardware and software resources. Management systems monitor, analyse and control network resources, system resources, and distributed application programs. Many organizations currently depend on mission-critical distributed applications, a trend that will increase as software engineering tools emerge that make it easier to construct distributed applications. We believe that manageability must be built in to distributed applications from the beginning rather than added in an ad hoc fashion after they have been developed. Just as designing software for usability, testability and maintenance are being addressed in the development process, so must designing for manageability. Application manageability is a research issue of particular interest to us. The work described in this paper focuses on instrumenting processes to allow them to respond to management requests, generate management reports, and maintain information required by the management system. We present an instrumentation architecture to support this, a prototype implementation which includes a class library of standard instrumentation, and a methodology for instrumentation.

[1]  James Won-Ki Hong,et al.  Toward distributed applications management using the OSI management framework , 1994, CASCON.

[2]  Jerry C. Yan Performance Tuning with AIMS - An Automated Instrumentation and Monitoring System for Multicomputers , 1994, HICSS.

[3]  Michael Anthony Bauer,et al.  Evaluating the costs of management: a distributed applications management testbed , 1996, CASCON.

[4]  Kenneth P. Birman,et al.  Tools for distributed application management , 1991, Computer.

[5]  Karsten Schwan,et al.  Falcon: on-line monitoring and steering of large-scale parallel programs , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[6]  Marshall T. Rose,et al.  The Open book - a practical perspective on OSI , 1990 .

[7]  Hanan Lutfiyya,et al.  Efficient management data acquisition and run-time control of DCE applications using the OSI management framework , 1996, Proceedings of IEEE International Workshop on System Management.

[8]  Michael T. Heath,et al.  Visualizing the performance of parallel programs , 1991, IEEE Software.

[9]  Kenneth P. Birman,et al.  The ISIS project: real experience with a fault tolerant programming system , 1990, EW 4.

[10]  Karsten Schwan,et al.  Opportunities and Tools for Highly Interactive Distributed and Parallel Computing , 1994 .

[11]  George Pavlou,et al.  EXPERIENCE OF IMPLEMENTING OSI MANAGEMENT FACILITIES , 1991 .

[12]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[13]  Yennun Huang,et al.  Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.

[14]  Jerome A. Rolia,et al.  Distributed Application Performance, Metrics and Management , 1993, Open Distributed Processing.

[15]  J. C. Yan,et al.  Performance tuning with AIMS/spl minus/an Automated Instrumentation and Monitoring System for multicomputers , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[16]  W. Auld,et al.  The Paragon Performance Monitoring Environment , 1993, ACPC.

[17]  David J. Taylor The use of process clustering in distributed-system event displays , 1993, CASCON.

[18]  James Won-Ki Hong,et al.  Modeling and management of distributed applications and services using the OSI management framework , 1996 .

[19]  Thomas Kunz,et al.  Services Supporting Management of Distributed Applications and Systems , 1997, IBM Syst. J..

[20]  Adrian Tang,et al.  Open networking with OSI , 1992 .

[21]  Greg Hills,et al.  Ensuring responsiveness and scalability for distributed applications , 1995, CASCON.

[22]  James Won-Ki Hong,et al.  Towards automating instrumentation of systems and applications for management , 1995, Proceedings of GLOBECOM '95.

[23]  Shikharesh Majumdar,et al.  The Stochastic Rendezvous Network Model for Performance of Synchronous Client-Server-like Distributed Software , 1995, IEEE Trans. Computers.

[24]  Devesh Bhatt,et al.  SPI: an instrumentation development environment for parallel/distributed systems , 1995, Proceedings of 9th International Parallel Processing Symposium.

[25]  Thorsten von Eicken,et al.  技術解説 IEEE Computer , 1999 .

[26]  Steve Saunders,et al.  Integration of Performance Measurement and Modeling for Open Distributed Processing , 1995 .

[27]  M. Kaiserswerth,et al.  Object instrumentation for distributed applications management , 1996 .

[28]  Jerome A. Rolia,et al.  The Method of Layers , 1995, IEEE Trans. Software Eng..

[29]  M. Kaiserswerth,et al.  Object instrumentation for distributed applications management , 1996, Proceedings of IFIP/IEEE International Conference on Distributed Platforms.

[30]  B. Meyer,et al.  Performance Analysis of Distributed Applications with ANSAmon , 1995 .

[31]  Abdul Waheed,et al.  VIZIR: an integrated environment for distributed program visualization , 1995, MASCOTS '95. Proceedings of the Third International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[32]  Michael Anthony Bauer,et al.  Policy-driven fault management in distributed systems , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[33]  Thomas Kunz,et al.  A tool for debugging OSF DCE applications , 1996, Proceedings of 20th International Computer Software and Applications Conference: COMPSAC '96.

[34]  Ying Sun,et al.  Measuring RPC traffic in an OS/2 DCE environment , 1997, Proceedings Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[35]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[36]  Ying Sun Measuring and Modelling RPC Performance in OSF DCE , 1997 .