A mathematical model for performability of Beowulf clusters

Beowulf clusters have become very popular as an alternative to supercomputers world-wide. However, the most pressing issues of today's cluster solutions is the need for high availability and performance. Such systems, clearly, are prone to break-downs. Even if cover is provided with some probability c, there will be reconfiguration and/or rebooting delays to resume the operation of a cluster. In this paper, the performance modelling for Beowulf multiprocessor systems is presented. For these systems, one head processor and several identical processors serving a common stream of arriving jobs is considered. To account for delays due to reconfiguration and rebooting, such systems are modelled and solved for exact performability measures, for both bounded and unbounded queuing capacities, using the spectral expansion method.

[1]  Samuel T. Chanson,et al.  Performance Models for the Processor Farm Paradigm , 1997, IEEE Trans. Parallel Distributed Syst..

[2]  John A. Buzacott,et al.  Stochastic models of manufacturing systems , 1993 .

[3]  Peter G. Harrison,et al.  Performance modelling of communication networks and computer architectures , 1992, International computer science series.

[4]  Ram Chakka,et al.  Performance and reliability modelling of computing systems using spectral expansion , 1995 .

[5]  Dieter Fiems,et al.  Discrete-time queues with generally distributed service times and renewal-type server interruptions , 2004, Perform. Evaluation.

[6]  Enver Ever,et al.  A mathematical model for performability evaluation of heterogeneous multiprocessor systems with reconfiguration and rebooting delays. , 2005 .

[7]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[8]  Ram Chakka,et al.  Spectral expansion solution for some finite capacity queues , 1998, Ann. Oper. Res..

[9]  Ram Chakka,et al.  Modelling multiserver systems with time or operation dependent breakdowns, alternate repair strategies, reconfiguration and rebooting delays , 2002 .

[10]  Kishor S. Trivedi,et al.  Should I Add a Processor ? , .

[11]  Tong Liu,et al.  Highly Reliable Linux HPC Clusters: Self-Awareness Approach , 2004, ISPA.

[12]  Kishor S. Trivedi,et al.  Should I add a processor? (performance evaluation) , 1990, Twenty-Third Annual Hawaii International Conference on System Sciences.

[13]  Chita R. Das,et al.  Coscheduling in Clusters: Is It a Viable Alternative? , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[14]  Thomas J. Hacker,et al.  A Methodology for Account Management in Grid Computing Environments , 2001, GRID.

[15]  Joel C. Adams,et al.  Small-college supercomputing: building a Beowulf cluster at a comprehensive college , 2002, SIGCSE '02.

[16]  Enver Ever,et al.  Numerical solution to the performability of a multiprocessor system with reconfiguration and rebooting delays. , 2005 .

[17]  C. Leangsuksun,et al.  Asymmetric Active-Active High Availability for High-end Computing , 2005 .