Approximate Solution Approach and Performability Evaluation of Large Scale Beowulf Clusters

Beowulf clusters are very popular and deployed worldwide in support of scientific computing, because of the high computational power and performance. However, they also pose several challenges, and yet they need to provide high availability. The practical large-scale Beowulf clusters result in unpredictable, fault-tolerant, often detrimental outcomes. Successful development of high performance in storing and processing huge amounts of data in large-scale clusters necessitates accurate quality of service (QoS) evaluation. This leads to develop as well as design, analytical models to understand and predict of complex system behaviour in order to ensure availability of large-scale systems. Exact modelling of such clusters is not feasible due to the nature of the large scale nodes and the diversity of user requests. An analytical model for QoS of large-scale server farms and solution approaches are necessary. In this paper, analytical modelling of large-scale Beowulf clusters is considered together with availability issues. A generic and flexible approximate solution approach is developed to handle large number of nodes for performability evaluation. The proposed analytical model and the approximate solution approach provide flexibility to evaluate the QoS measurements for such systems. In order to show the efficacy and the accuracy of the proposed approach, the results obtained from the analytical model are validated with the results obtained from the discrete event simulations.

[1]  Enver Ever,et al.  A mathematical model for performability of Beowulf clusters , 2006, 39th Annual Simulation Symposium (ANSS'06).

[2]  Yonggang Wen,et al.  Content routing and lookup schemes using global bloom filter for content-delivery-as-a-service , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[3]  G. Grimmett,et al.  Probability and random processes , 2002 .

[4]  Enver Ever,et al.  A mathematical model for highly available clusters with one head and several identical computing nodes. , 2006 .

[5]  Jordi Vilaplana,et al.  A queuing theory model for cloud computing , 2014, The Journal of Supercomputing.

[6]  Bryan C. Pijanowski,et al.  A big data urban growth simulation at a national scale: Configuring the GIS and neural network based Land Transformation Model to run in a High Performance Computing (HPC) environment , 2014, Environ. Model. Softw..

[7]  Jelena V. Misic,et al.  Performance Analysis of Cloud Computing Centers Using M/G/m/m+r Queuing Systems , 2012, IEEE Transactions on Parallel and Distributed Systems.

[8]  Geoffrey C. Fox,et al.  Distributed and Cloud Computing: From Parallel Processing to the Internet of Things , 2011 .

[9]  Joel C. Adams,et al.  Small-college supercomputing: building a Beowulf cluster at a comprehensive college , 2002, SIGCSE '02.

[10]  Geoffrey C. Fox,et al.  High Performance Parallel Computing with Clouds and Cloud Technologies , 2009, CloudComp.

[11]  Chenghu Zhou,et al.  A strategy for raster-based geocomputation under different parallel computing platforms , 2014, Int. J. Geogr. Inf. Sci..

[12]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[13]  Raja Nassar,et al.  Availability Modeling and Evaluation on High Performance Cluster Computing Systems , 2006, J. Res. Pract. Inf. Technol..

[14]  Enver Ever,et al.  Modelling and analysis of vertical handover in highly mobile environments , 2015, The Journal of Supercomputing.

[15]  Stephen L. Scott,et al.  HA-OSCAR: the birth of highly available OSCAR , 2003 .

[16]  Enver Ever,et al.  Analytical modelling and simulation of small scale, typical and highly available Beowulf clusters with breakdowns and repairs , 2009, Simul. Model. Pract. Theory.

[17]  Nyalleng Moorosi,et al.  Development of Beowulf Cluster to Perform Large Datasets Simulations in Educational Institutions , 2014 .

[18]  Christine Morin,et al.  High Availability on Cloud with HA-OSCAR , 2011, Euro-Par Workshops.

[19]  Yuan-Shun Dai,et al.  Performance evaluation of cloud service considering fault recovery , 2009, The Journal of Supercomputing.

[20]  J. Banks,et al.  Discrete-Event System Simulation , 1995 .

[21]  Enver Ever,et al.  Numerical solution to the performability of a multiprocessor system with reconfiguration and rebooting delays. , 2005 .

[22]  Rosa Filgueira,et al.  The cloud paradigm applied to e-Health , 2013, BMC Medical Informatics and Decision Making.

[23]  Azzedine Boukerche,et al.  Towards building a highly-available cluster based model for high performance computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[24]  Enver Ever,et al.  A hybrid approach to minimize state space explosion problem for the solution of two stage tandem queues , 2013, J. Netw. Comput. Appl..

[25]  Wei Wang,et al.  Stochastic modeling of dynamic power management policies in server farms with setup times and server failures , 2014, Int. J. Commun. Syst..

[26]  Philip M. Papadopoulos Extending clusters to Amazon EC2 using the Rocks toolkit , 2011, Int. J. High Perform. Comput. Appl..

[27]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[28]  Harry G. Perros,et al.  Service Performance and Analysis in Cloud Computing , 2009, 2009 Congress on Services - I.