A Systematic Characterization of Application Sensitivity to Network Performance

This thesis provides a systematic study of application sensitivity to network performance. Our aim is to investigate the impact of communication performance on real applications. Using the LogGP model as an abstract framework, we set out to understand which aspects of communication performance are most important. The focus of our investigation thus centers on a quantification of the sensitivity of applications to the parameters of the LogGP model: network latency, software overhead, per-message and per-byte bandwidth. We define sensitivity as the change in some application performance metric, such as run time or updates per second, as a function of the LogGP parameters. The strong association of the LogGP model with real machine components allows us to draw architectural conclusions from the measured sensitivity curves as well. The basic methodology to measure sensitivity is simple. First, we build a networking apparatus whose parameters are adjustable according to the LogGP model. To build such an apparatus we start with a higher performance system than what is generally available and add controllable delays to it. Next, the apparatus must be calibrated to make sure the parameters can be accurately controlled according to the model. The calibration also yields the useful range of LogGP parameters we can consider. Once we have a calibrated apparatus, we run real applications in a network with controllable performance characteristics. We vary each LogGP parameter in turn to observe the sensitivity of the application relative to a single parameter. Sensitive applications will exhibit a high rate of "slowdown" as we scale a given parameter. Insensitive applications will show little or no difference in performance as we change the parameters. In addition, we can categorize the shape of the slowdown curve because our apparatus allows us to observe plateaus or other discontinuities. In all cases, we must compare our measured results against analytic models of the applications. The analytic models serve a check against our measured data. Points where the data and model deviate from one another expose areas that warrant further investigation. We use three distinct application suites in order to broaden the applicability of our results. The first suite consists of parallel programs designed for low-overhead Massively Parallel Processors (MPPs) and Networks of Workstations (NOWs). The second suite is a subset of the NAS parallel benchmarks, which were designed on older MPPs. The final suite consists of the SPECsfs benchmark, which is designed to measure Network File System (NFS) performance over local area networks. Our results show that applications display the strongest sensitivity to software overhead, slowing down by as much as a factor of 50 when overhead is increased by a factor of 20. Even lightly communicating applications can suffer a factor of 3-5 slowdown. Frequently communicating applications also display strong sensitivity to various bandwidths, suggesting that communication phases are bursty and limited by the rate at which messages can be injected into the network. We found that simple models are able to predict sensitivity to the software overhead and bandwidth parameters for most of our applications. We also found that queuing theoretic models of NFS servers are useful in understanding the performance of industry published SPECsfs benchmark results. The effect of added latency is qualitatively different from the effect of added overhead and bandwidth. Further, the effects are harder to predict because they are more dependent on application structure. For our measured applications, the sensitivity to overhead and various bandwidths is much stronger than sensitivity to latency. We found that this result stemmed from programmers who are quite adept at using latency tolerating techniques such as pipelining, overlapping, batching and caching. However, many of these techniques are still sensitive to software overhead and band

[1]  V. Jacobson,et al.  Congestion avoidance and control , 1988, CCRV.

[2]  James C. Hoe,et al.  START-NG: Delivering Seamless Parallel Computing , 1995, Euro-Par.

[3]  Mark A. Johnson,et al.  Solving problems on concurrent processors. Vol. 1: General techniques and regular problems , 1988 .

[4]  J. Larus,et al.  Tempest and Typhoon: user-level shared memory , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[5]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[6]  Eric A. Brewer,et al.  How to get good performance from the CM-5 data network , 1994, Proceedings of 8th International Parallel Processing Symposium.

[7]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[8]  Jack Dongarra,et al.  Message-Passing Performance of Various Computers , 1995 .

[9]  Joseph Pasquale,et al.  The importance of non-data touching processing overheads in TCP/IP , 1993, SIGCOMM 1993.

[10]  Zhen Liu,et al.  Evaluation of TCP Vegas: emulation and experiment , 1995, SIGCOMM '95.

[11]  Riccardo Gusella,et al.  A measurement study of diskless workstation traffic on an Ethernet , 1990, IEEE Trans. Commun..

[12]  Emin Gün Sirer,et al.  SPIN—an extensible microkernel for application-specific operating system services , 1995, OPSR.

[13]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[14]  Amin Vahdat,et al.  GLUix: a global layer unix for a network of workstations , 1998, Softw. Pract. Exp..

[15]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[16]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[17]  John C. S. Lui,et al.  NFS/M: an open platform mobile file system , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[18]  Brian Wong Configuration and Capacity Planning for Solaris Servers , 1997 .

[19]  Douglas C. Schmidt,et al.  Measuring the performance of parallel message-based process architectures , 1995, Proceedings of INFOCOM'95.

[20]  Andrea C. Arpaci-Dusseau,et al.  High-performance sorting on networks of workstations , 1997, SIGMOD '97.

[21]  Maurice Yarrow,et al.  Communication Improvement for the LU NAS Parallel Benchmark: A Model for Efficient Parallel Relaxation Schemes , 1997 .

[22]  Chet Juszczak,et al.  Improving the Write Performance of an NFS Server , 1994, USENIX Winter.

[23]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[24]  Cary G. Gray,et al.  Metadata Logging in an NFS Server , 1995, USENIX.

[25]  Raj Jain,et al.  Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks , 1989, Comput. Networks.

[26]  James Hall,et al.  Counting the cycles: a comparative study of NFS performance over high speed networks , 1997, Proceedings of 22nd Annual Conference on Local Computer Networks.

[27]  Richard B. Gillett Memory Channel Network for PCI , 1996, IEEE Micro.

[28]  Mario Lauria,et al.  LogP performance characterization of fast messages atop Myrinet , 1998, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.

[29]  David L. Dill,et al.  Parallelizing the Murphi Verifier , 1997, CAV.

[30]  Joseph D. Touch,et al.  High-Speed Data Paths in Host-Based Routers , 1998, Computer.

[31]  Will E. Leland,et al.  High time-resolution measurement and analysis of LAN traffic: Implications for LAN interconnection , 1991, IEEE INFCOM '91. The conference on Computer Communications. Tenth Annual Joint Comference of the IEEE Computer and Communications Societies Proceedings.

[32]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[33]  James R. Goodman,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[34]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[35]  Rick Macklem,et al.  Lessons Learned Tuning the 4.3BSD Reno Implementation of the NFS Protocol , 1991, USENIX Winter.

[36]  Yossi Matias,et al.  Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[37]  Richard P. Martin,et al.  HPAM: an active message layer for a network of hp workstations , 1994, Symposium Record Hot Interconnects II.

[38]  Deborah Estrin,et al.  Enabling large-scale simulations: selective abstraction approach to the study of multicast protocols , 1998, Proceedings. Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247).

[39]  Armando Fox,et al.  Scalable cluster-based network services , 1997 .

[40]  Carl Smith,et al.  NFS Version 3: Design and Implementation , 1994, USENIX Summer.

[41]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[42]  Bruce E. Keith,et al.  LADDIS: The Next Generation in NFS File Server Benchmarking , 1993, USENIX Summer.

[43]  Derek McAuley,et al.  Experiences of building an ATM switch for the local area , 1994, SIGCOMM 1994.

[44]  Gilles Muller,et al.  FT-NFS: an efficient fault-tolerant NFS server designed for off-the-shelf workstations , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[45]  John L. Hennessy,et al.  The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors , 1995 .

[46]  Richard B. Bunt,et al.  The effect of client caching on file server workloads , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[47]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[48]  Csaba Andras Moritz,et al.  LoGPC: Modeling Network Contention in Message-Passing Programs , 2001, IEEE Trans. Parallel Distributed Syst..

[49]  Dan Duchamp Optimistic Lookup of Whole NFS Paths in a Single Operation , 1994, USENIX Summer.

[50]  Remzi H. Arpaci-Dusseau,et al.  Empirical evaluation of the CRAY-T3D: a compiler perspective , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[51]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[52]  Scott Pakin,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[53]  Jeffrey C. Mogul Recovery in Spritely NFS , 1994, Comput. Syst..

[54]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[55]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[56]  Rick Macklem,et al.  Not Quite NFS, Soft Cache Consistency for NFS , 1994, USENIX Winter.

[57]  Jeanna Neefe Matthews,et al.  An Exploration of Network RAM , 1998 .

[58]  Rafael Hector Saavedra-Barrera,et al.  CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .

[59]  Jeffrey C. Mogul Network Locality at the Scale of Processes , 1992, ACM Trans. Comput. Syst..

[60]  Chris J. Scheiman,et al.  Evaluation of architectural support for global address-based communication in large-scale parallel machines , 1996, ASPLOS VII.

[61]  Mary K. Vernon,et al.  LoPC: modeling contention in parallel algorithms , 1997, PPOPP '97.

[62]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[63]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[64]  David Clark,et al.  An analysis of TCP processing overhead , 1989 .

[65]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[66]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[67]  David A. Patterson,et al.  Logp quantified: the case for low-overhead local area networks , 1995 .

[68]  Robert W. Horst TNet: A Reliable System Area Network , 1995, IEEE Micro.

[69]  C. Dalton,et al.  Afterburner (network-independent card for protocols) , 1993, IEEE Network.

[70]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[71]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[72]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[73]  Andrea C. Arpaci-Dusseau,et al.  Fast Parallel Sorting Under LogP: Experience with the CM-5 , 1996, IEEE Trans. Parallel Distributed Syst..

[74]  Yossi Matias,et al.  Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[75]  M. L. Bailey,et al.  The x-chip: an experiment in hardware demultiplexing , 1992, IEEE Workshop on the Architecture and Implementation of High Performance Communication Subsystems.

[76]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[77]  Gunnar Karlsson,et al.  Fast address look-up for internet routers , 1998, Broadband Communications.

[78]  David E. Culler,et al.  Virtual network transport protocols for Myrinet , 1998, IEEE Micro.

[79]  K. M. Khalil,et al.  LAN traffic analysis and workload characterization , 1990, [1990] Proceedings. 15th Conference on Local Computer Networks.

[80]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[81]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[82]  Satish K. Tripathi,et al.  Improving NFS Performance Over Wireless Links , 1997, IEEE Trans. Computers.

[83]  S. Wittevrongel,et al.  Queueing Systems , 2019, Introduction to Stochastic Processes and Simulation.

[84]  Brian N. Bershad,et al.  SPINE: An Operating System for Intelligent Network Adapters , 1998 .

[85]  David H. Bailey,et al.  NAS parallel benchmark results , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[86]  S.S. Lumetta,et al.  Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[87]  Alan J. Hu,et al.  Protocol verification as a hardware design aid , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[88]  Mary K. Vernon,et al.  Managing server load in global memory systems , 1997, SIGMETRICS '97.

[89]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[90]  Robert Tappan Morris,et al.  NFS dynamics over flow-controlled wide area networks , 1997, Proceedings of INFOCOM '97.

[91]  Murad S. Taqqu,et al.  On the Self-Similar Nature of Ethernet Traffic , 1993, SIGCOMM.

[92]  Eric A. Brewer,et al.  High-level optimization via automated statistical modeling , 1995, PPOPP '95.