Experiences Deploying Parallel Applications on a Large-scale Grid

We describe our experiences with integrating sev- eral Grid software components into a single coherent system that is used to write and run parallel applications on the Grid. The integrated components are the Grid Application Toolkit (GAT), ProActive, Satin and Ibis. We experimented with this (Java- based) system by participating in the N-Queens contest of the Grids@work event in October 2005. In addition to integrating available components, we wrote a ProActive plugin for the GAT, a parallel N-Queens solver, and an application to manage Grid deployment of N-Queens. We identified several connectivity issues and scalability problems in the components we use. We show how we modified some of the components to solve of these problems. We successfully ran experiments on 960 processors across Grid'5000, with an efficiency of around 85%, winning the prize for the largest number of nodes deployed during the contest. The Grids@work event held in October 2005 in Sophia Antipolis, France (7) was composed of a series of conferences and tutorials including the 2nd Grid Plugtests. The objective was to bring together Grid users and to present and discuss current and future features of the ProActive Grid platform, and to test the deployment and interoperability of Grid applications on various Grids. A part of the 2nd Grid Plugtests consisted of an N-Queens contest, where the aim was to find the number of solutions to the N-Queens problem, N being as big as possible, in a limited amount of time. We participated in this contest with a parallel N-Queens application. We used this application and the Grid testbed that was provided to integrate many software components, and to evaluate the integration, functionality and performance. The global structure of the system we used is shown in Figure 1. For portability reasons, all software components are written in Java. The N-Queens application itself is written in Satin, our Java-based divide-and-conquer programming model (9). Satin is implemented on top of the Ibis (11) communication library, while deployment of the application was done with a manager application that was written specifically for this contest. The manager uses the Java Grid Application Toolkit (2) (GAT) to access the Grid. The Java GAT in turn uses the ProActive (5) middleware for Grid deployment. This paper describes our experiences with integrating all different software components we used, and identifies some problems in our software packages, Ibis and Satin in particular, that we discovered during the contest. Some are related to scaling a parallel programming system up to 1000 machines that are distributed over a large geographical area, others were related to typical Grid issues such as firewall problems and network misconfigurations. Although we did encounter some difficulties, we were still able to run the parallel N-Queens application on 961 CPUs scattered across different Grid'5000 sites in France. Finally, we suggest possible solutions for the problems encountered. After identifying and solving the prob- lems we describe in this paper, we won the prize for largest number of nodes deployed in parallel during the contest. The remainder of this paper is structured as follows. First, we discuss the deployment tools GAT and ProActive (Sec- tion II), followed by Ibis and Satin (Section III), and then the N-Queens application itself (Section IV) and the testbed (Section V). Section VI will discuss the issues we encountered using a large-scale Grid, along with the solutions we applied. Sections VII and VIII summarize results and our conclusions, respectively.

[1]  Kees Verstoep,et al.  Wide-area communication for grids: an integrated solution to connectivity, performance and security problems , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[2]  Jason Maassen,et al.  Ibis: an efficient Java-based grid programming environment , 2002, JGI '02.

[3]  Denis Caromel,et al.  Towards seamless computing and metacomputing in Java , 1998 .

[4]  Henri E. Bal,et al.  Efficient load balancing for wide-area divide-and-conquer applications , 2001, PPoPP '01.

[5]  Jason Maassen,et al.  Efficient Java RMI for parallel programming , 2001, TOPL.

[6]  Denis Caromel,et al.  Interactive and descriptor-based deployment of object-oriented grid applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[7]  Rob van Nieuwpoort,et al.  The Grid Application Toolkit: Toward Generic and Easy Application Programming Interfaces for the Grid , 2005, Proceedings of the IEEE.

[8]  John Shalf,et al.  Enabling Applications on the Grid: A Gridlab Overview , 2003, Int. J. High Perform. Comput. Appl..

[9]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.