Design and implementation of an easy-to-use automated system to build Beowulf parallel computing clusters

Since Beowulf was born in 1994 and NASA constructed the first Beowulf cluster using IBM PCs, several schemes have been developed. This type of clusters allows high-performance parallel computing using low-cost PC hardware. Through this work we have designed and implemented a new operating system for the automatic construction and installation of a Beowulf parallel computing cluster. In particular, the paper presents the operating system and provides an easy way to construct a parallel computing cluster using this technology. The system is capable to support up to 254 computers and it is aimed to provide an environment in which any parallel computing application target may be built instantly from anywhere by anybody. In this sense, the main objective is to provide the researchers with an automated and easy to use tool for constructing low-cost parallel computing clusters in their own laboratories as a way to solve concrete problems or as a previous step before launching the code in a large computing facility as Blue Gene or MareNostrum. The system is fully functional and it can be obtained and distributed in a gnu license basis from the group website: www.ehu.es/AC.

[1]  J. Lions,et al.  Résolution d'EDP par un schéma en temps « pararéel » , 2001 .

[2]  WEI LIU,et al.  DESIGN AND IMPLEMENTATION OF A DISTRIBUTED NFS SERVER ON CLUSTER OF WORKSTATIONS , 2000 .

[3]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[4]  Koichi Wada,et al.  Design and performance of Maestro cluster network , 2000, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.

[5]  Y. Maday,et al.  A “Parareal” Time Discretization for Non-Linear PDE’s with Application to the Pricing of an American Put , 2002 .

[6]  Shang Rong Tsai,et al.  Load balance facility in distributed MINIX system , 1994, Proceedings of Twentieth Euromicro Conference. System Architecture and Integration.

[7]  Zheng Weimin,et al.  A distributed naming mechanism in scalable cluster file system , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[8]  Y Maday,et al.  Parallel-in-time molecular-dynamics simulations. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Ralph E. Droms,et al.  Automated Configuration of TCP/IP with DHCP , 1999, IEEE Internet Comput..

[10]  David P. Rodgers,et al.  Improvements in multiprocessor system design , 1985, ISCA '85.

[11]  Chao-Tung Yang,et al.  A Information Monitoring and Job Scheduling System for Multiple Linux PC Clusters , 2006, 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06).

[12]  Shingo Ohki,et al.  Improving the Research Environment of High Performance Computing for Non-cluster Experts Based on Knoppix Instant Computing Technology , 2006, Euro-Par.

[13]  M. De La Sen,et al.  Approximate models to describe real sampling and hold processes based on multirate sampling techniques , 2000, Proceedings of the 2000 American Control Conference. ACC (IEEE Cat. No.00CH36334).

[14]  M. G. Sevillano-Berasategui,et al.  Review of tokamak codes , 2008, 2008 5th International Conference on Electrical Engineering, Computing Science and Automatic Control.