Fire Phoenix Cluster Operating System Kernel and its Evaluation

Fire Phoenix cluster operating system kernel (Phoenix kernel) is a minimum set of cluster core junctions with scalability and fault-tolerance support. In this paper, we define components of cluster operating system kernel, and introduce its internal mechanism for scalability and fault-tolerance support. Based on Phoenix kernel, user environments can be easily constructed according to users' needs. In addition, we evaluate Phoenix kernel from four different perspectives, such as fault-tolerance, scalability, performance impact on scientific computing, and easiness of constructing user environment. Our design has been proved in the practices of Dawning 4000A super server, which is the biggest cluster system for scientific computing in China

[1]  David E. Culler,et al.  Wide area cluster monitoring with Ganglia , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[2]  Philip M. Papadopoulos,et al.  NPACI: rocks: tools and techniques for easily deploying manageable Linux clusters , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[3]  Scott R. Kohn,et al.  Toward a Common Component Architecture for High-Performance Scientific Computing , 1999, HPDC.

[4]  Maurice J. Bach The Design of the UNIX Operating System , 1986 .

[5]  Indranil Gupta,et al.  Gulfstream - a system for dynamic topology management in multi-domain server farms , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[6]  Li Bo,et al.  GridView: a dynamic and visual grid monitoring system , 2004, Proceedings. Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, 2004..

[7]  Stephen L. Scott OSCAR and the Beowulf arms race for the "cluster standard" , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[8]  Atsushi Hori SCore: an integrated cluster system software package for high performance cluster computing , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[9]  Putchong Uthayopas,et al.  SCE: A Fully Integrated Software Tool for Beowulf Cluster System , 2001 .

[10]  Narayan Desai,et al.  Clusters as large-scale development facilities , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[11]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[12]  Tim Burke,et al.  A high-availability clustering architecture with data integrity guarantees , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[13]  Monika Henzinger Indexing the web - a challenge for supercomputers , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[14]  Narayan Desai,et al.  The process management component of a scalable systems software environment , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[15]  Dan Dumitriu,et al.  An overview of the Galaxy management framework for scalable enterprise cluster computing , 2000, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.

[16]  Ibrahim Haddad,et al.  Building highly available HPC clusters with HA-OSCAR , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).