论文信息 - Using cause-effect analysis to understand the performance of distributed programs

Using cause-effect analysis to understand the performance of distributed programs

Abstract Understanding the performance of distributed programs can be very difficult, since a program’s performance depends on characteristics of the application, the underlying hard- ware, the software environment, and interactions among all three. In this paper we present cause-effect analysis (CEA), a general approach to understanding distributed program performance that facilitates performance analysis, tuning, and prediction. Using detailed program traces gathered at execution time as input, CEA automatically generates ex- planations for important performance phenomena, identify- ing code segments that are responsible for the occurrence of the phenomena. We illustrate our approach by describing CEA techniques for three classes of overheads in distributed programs: con- tention, synchronization, and communication. Using the ex- planations produced by CEA, we are able to understand and minimize common performance problems in real appli- cations including load imbalance, false sharing, and resource contention.

Virgílio A. F. Almeida | Thomas J. LeBlanc | Wagner Meira | T. LeBlanc | Wagner Meira, Jr

[1] Wagner Meira,et al. Waiting time analysis and performance visualization in Carnival , 1996, SPDT '96.

[2] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[3] Alan L. Cox,et al. TreadMarks: shared memory computing on networks of workstations , 1996 .

[4] Jim Gray,et al. Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[5] Barton P. Miller,et al. Critical path analysis for the execution of parallel and distributed programs , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[6] Thomas J. Leblanc,et al. Analyzing Parallel Program Executions Using Multiple Views , 1990, J. Parallel Distributed Comput..

[7] Thomas J. LeBlanc,et al. Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[8] Jong-Deok Choi,et al. A mechanism for efficient debugging of parallel programs , 1988, PADD '88.

[9] James R. Larus,et al. StormWatch: a tool for visualizing memory system protocols , 1995 .

[10] Mark Crovella,et al. Performance debugging using parallel performance predicates , 1993, PADD '93.

[11] Alan L. Cox,et al. Performance debugging shared memory parallel programs using run-time dependence analysis , 1997, SIGMETRICS '97.

[12] Virgílio A. F. Almeida,et al. The Influence of Geographical and Cultural Issues on the Cache Proxy Server Workload , 1998, Comput. Networks.

[13] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[14] Jr. Wagner Meira. Understanding parallel program performance using cause-effect analysis , 1998 .

[15] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[16] Nikolaos Hardavellas,et al. Understanding the Performance of DSM Applications , 1997, CANPC.