Configuration and performance of a Beowulf cluster for large-scale scientific simulations

To achieve optimal performance on a Beowulf cluster for large-scale scientific simulations, it's necessary to combine the right numerical method with its efficient implementation to exploit the cluster's critical high-performance components. This process is demonstrated using a simple but prototypical problem of solving a time-dependent partial differential equation. Beowulf clusters in virtually every price range are readily available today for purchase in fully integrated form from a large variety of vendors. At the University of Maryland, Baltimore County (UMBC), a medium-sized 64-processor cluster with high-performance interconnect and extended disk storage was bought from IBM. The cluster has several critical components, and this article demonstrates their roles using a prototype problem from the numerical solution of time-dependent partial differential equations (PDEs). The problem was selected to show how judiciously combining a numerical algorithm and its efficient implementation with the right hardware (in this case, the Beowulf cluster) can achieve parallel computing's two fundamental goals: to solve problems faster and to solve larger problems than we can on a serial computer.