Prioritizing Network Event Handling in Clusters of Workstations

The use of modern system area networking technologies [9,3] to construct tightly integrated clusters of workstations exposes two weaknesses of current operating systems. First, the low latency of current networks is often hidden from the application due to the high cost of interrupt handling. Second, network event handling during high load may result in serious performance degradation because all processor time is used for network event handling resulting in application starvation. This paper concerns the problems related to providing efficient and stable network event handling for clusters of workstations and network servers. By stable we mean that the throughput and response time of the system does not suffer when the workload offered to the system is increased beyond the maximum capacity of the system.

[1]  Eric Jul,et al.  A Stream Protocol Implementation for an SCI-Based Cluster of Workstations , 1999 .

[2]  Peter Druschel,et al.  Resource containers: a new facility for resource management in server systems , 1999, OSDI '99.

[3]  MosbergerDavid,et al.  httperfa tool for measuring web server performance , 1998 .

[4]  H.H.J. Hum,et al.  Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[5]  Abraham Silberschatz,et al.  Signaled Receiver Processing , 2000, USENIX Annual Technical Conference, General Track.

[6]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[7]  Edward W. Felten,et al.  Reducing waiting costs in user-level communication , 1997, Proceedings 11th International Parallel Processing Symposium.

[8]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[9]  Peter Druschel,et al.  Lazy receiver processing (LRP): a network subsystem architecture for server systems , 1996, OSDI '96.

[10]  K. K. Ramakrishnan,et al.  Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.

[11]  Peter Druschel,et al.  Measuring the Capacity of a Web Server , 1997, USENIX Symposium on Internet Technologies and Systems.

[12]  Peter J. Keleher,et al.  Responsiveness without interrupts , 1999, ICS '99.

[13]  K. Langendoen,et al.  Integrating polling, interrupts, and thread management , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).