gluepy: A Framework for Flexible Programming in Complex Grid Environments

Problem-solving frameworks in large-scale and wide-area environments must handle connectivity issues (NAT and firewalls), maintain scalability with respect to connection management, accommodate dynamic processes joining/leaving, and provide simple means to tolerate communication/node failures. We design and implement such a framework by minimally extending distributed object-oriented models for maximum generality and flexibility. In the framework, parallelism is expressed via asynchronous method invocations to allow a natural transition from sequential programs. To cope with asynchronous events such as dynamic joins and asynchronous method invocation returns, we introduce an implicit serialization semantics on objects to relieve programmers from explicit synchronization primitives while avoiding recursion deadlock problems. In our implementation, participating nodes automatically construct a TCP overlay so as to address connectivity and scalability issues. We have implemented our framework as a library for Python to allow rapid development of complex workflows and to maximally leverage the richness of its libraries. For evaluation, we show on over 900 cores across 9 clusters with complex networks (involving firewall and NATs) and process managements (involving SSH, torque, and SGE), how a simple branch-and-bound search application can be expressed simply and executed easily.

[1]  Takashi Chikayama,et al.  A scalable and efficient self-organizing failure detector for grid applications , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[2]  Tim Fühner,et al.  A novel framework for distributing computations DisPyTE – distributing Python tasks environment , 2006 .

[3]  Kento Aida,et al.  A case study in running a parallel branch and bound application on the grid , 2005, The 2005 Symposium on Applications and the Internet.

[4]  K. Taura GXP : An Interactive Shell for the Grid Environment , 2004, Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'04).

[5]  Satoshi Matsuoka,et al.  A Java-based programming environment for hierarchical Grid: Jojo , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[6]  Jason Maassen,et al.  Fault-Tolerant Scheduling of Fine-Grained Tasks in Grid Environments , 2006, Int. J. High Perform. Comput. Appl..

[7]  Henri E. Bal,et al.  Efficient load balancing for wide-area divide-and-conquer applications , 2001, PPoPP '01.

[8]  Charles E. Perkins,et al.  Ad-hoc on-demand distance vector routing , 1999, Proceedings WMCSA'99. Second IEEE Workshop on Mobile Computing Systems and Applications.

[9]  Jason Maassen,et al.  Ibis: a flexible and efficient Java‐based Grid programming environment , 2005, Concurr. Pract. Exp..

[10]  Jason Maassen,et al.  Smartsockets: solving the connectivity problems in grid computing , 2007, HPDC '07.

[11]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.

[12]  Saikat Guha,et al.  NUTSS: a SIP-based approach to UDP and TCP network connectivity , 2004, FDNA '04.

[13]  Kazuyuki Shudo,et al.  P3: P2P-based middleware enabling transfer and aggregation of computational resources , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[14]  Daniel S. Katz,et al.  The Pegasus portal: web based grid computing , 2005, SAC '05.

[15]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[16]  Denis Caromel,et al.  A High Performance Java Middleware with a Real Application , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[17]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[18]  Bryan Ford,et al.  Peer-to-Peer Communication Across Network Address Translators , 2005, USENIX Annual Technical Conference, General Track.

[19]  David Abramson,et al.  Executing Large Parameter Sweep Applications on a Multi-VO Testbed , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[20]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.