论文信息 - Lessons learned while operating two large SCI clusters

Lessons learned while operating two large SCI clusters

The availability of commodity high performance components for workstations and networks made it possible to build up large, PC based compute clusters at modest costs. These clusters seem to be a realistic alternative to proprietary, massively parallel systems with respect to the price/performance ratio. However, from the administration point of view, those systems are still often solely a collection of autonomous nodes, connected by a fast short area network. Therefore, aiming at providing the best possible performance in daily work to all users, a lot of work has to be done before obtaining the expected result. The paper describes the problem areas we had to cope with during the integration of two large SCI clusters (one with 64 and one with 192 processors) in the environment of the Paderborn Center for Parallel Computing.

Axel Keller | Andreas Krawinkel

[1] Brent Callaghan,et al. NFS Version 3 Protocol Specification , 1995, RFC.

[2] Axel Keller,et al. CCS resource management in networked HPC systems , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[3] Hans-Ulrich Heiss,et al. Shared Memory Programming on PC-based SCI Clusters , 1999 .

[4] Hans-Ulrich Heiss,et al. SCI for TCP/IP with Linux , 1999 .

[5] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[6] Miron Livny,et al. A worldwide flock of Condors: Load sharing among workstation clusters , 1996, Future Gener. Comput. Syst..

[7] Jörn Gehring,et al. Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations , 1996, JSSPP.

[8] Axel Keller,et al. RSD — Resource and Service Description , 1998 .

[9] Axel Keller,et al. Resource Management for High_performance PC Clusters , 1999, HPCN Europe.

[10] Tim Howes,et al. Lightweight Directory Access Protocol , 1995, RFC.

[11] Hermann Hellwagner,et al. SCI: Scalable Coherent Interface: Architecture and Software for High-Performance Compute Clusters , 1999 .

[12] Jack Dongarra,et al. Pvm: A Users' Guide and Tutorial for Network Parallel Computing , 1994 .

[13] Axel Keller,et al. RsdEditor: a graphical user interface for specifying metacomputer components , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).