On the appropriateness of commodity operating systems for large-scale, balanced computing systems

In the past five years (1997-2002), we have been involved in the design and development of Cplant/spl trade/. An important goal was to take advantage of commodity approaches wherever possible. In particular, we selected Linux, a commonly available operating system, for the compute nodes of Cplant/spl trade/. While the use of commodity solutions, including Linux, was critical to the success of Cplant/spl trade/, we believe that such an approach will not be viable in the development of the next generation of very large-scale systems. We present our definition of a balanced system and discuss several limitations of commodity operating systems in the context of balanced systems. These limitations are categorized into technical limitations (e.g., the structure of the virtual memory system) and social limitations (e.g., the kernel development process). While our direct experience is based on Linux, issues we have identified should be relevant to all commodity operating systems.

[1]  David L. Black,et al.  An OSF/1 UNIX for Massively Parallel Multicomputers , 1993, USENIX Winter.

[2]  P. Pierce,et al.  The NX/2 operating system , 1988, C3P.

[3]  Amin Vahdat,et al.  GLUix: a global layer unix for a network of workstations , 1998, Softw. Pract. Exp..

[4]  Dejan S. Milojicic,et al.  Operating system support for concurrent remote task creation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[5]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[6]  F M Hoffman,et al.  The do it yourself supercomputer. , 2001, Scientific American.

[7]  D. Orr,et al.  Mach: a foundation for open systems (operating systems) , 1989, Proceedings of the Second Workshop on Workstation Operating Systems.

[8]  David S. Greenberg,et al.  A System Software Architecture for High End Computing , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[9]  David S. Greenberg,et al.  Massively parallel computing using commodity components , 2000, Parallel Comput..

[10]  Willy Zwaenepoel,et al.  IO-lite: a unified I/O buffering and caching system , 1999, OSDI '99.

[11]  Francine Berman Viewpoint: From TeraGrid to knowledge grid , 2001, CACM.

[12]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[13]  Jeffrey S. Chase,et al.  Trapeze / IP : TCP / IP at Near-Gigabit Speeds , 1999 .

[14]  Patricia Gilfeather,et al.  Fragmentation and high performance IP , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[15]  Thomas Stricker,et al.  Speculative defragmentation - a technique to improve the communication software efficiency for Gigabit Ethernet , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[16]  Rolf Riesen,et al.  SUNMOS for the Intel Paragon - a brief user`s guide , 1994 .

[17]  Ron Brightwell,et al.  Scalable parallel application launch on Cplant , 2001, SC.