Performance Monitoring and Analysis on Multi-cluster Parallel Jobs

A multi-cluster computing model is introduced. In the analysis on the basis of the multi-cluster system features, which include flexible architecture and reconfigurability, the methods and technologies of performance monitoring and analysis, which is applicable to this model, are researched. A performance monitoring and analysis tool of parallel jobs is designed and implemented. In this tool, dynamic performance analysis method is used, distributed software design framework is followed, a high cohesion and low coupling structure is designed. Operation results show it can work effectively in the multi-cluster computing model.