Design, Implementation, and Evaluation of Trellis-SDP for File-Level Data Parallelism

Although data parallelism is a well-known computational model, there are few programming systems that are both easy to program (for simple applications) and able to work across administrative domains. For data sets (e.g., collections of image data) that are often inherently distributed, there is a need for a simple data-parallel programming system. We describe the design, implementation, and an evaluation of Trellis-SDP, a simple data-parallel programming system that facilitates the rapid development of data- intensive applications. Trellis-SDP is layered on top of the Trellis infrastructure, a software system for creating overlay metacomputers: user-level aggregations of computer systems. Trellis-SDP is based on file-level data parallelism and provides a Master-Worker programming framework in which the worker components can run self-contained, new or existing binary applications. We evaluate our programming system with a non-trivial seismic data processing application.

[1]  Francine Berman,et al.  Adaptive scheduling of master/worker applications on distributed computational resources , 2001 .

[2]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[3]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[4]  Paul Lu,et al.  Trellis-SDP: A simple data-parallel programming interface , 2004, Workshops on Mobile and Wireless Networking/High Performance Scientific, Engineering Computing/Network Design and Architecture/Optical Networks Control and Management/Ad Hoc and Sensor Networks/Compil.

[5]  Ian T. Foster,et al.  A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[6]  Andrea C. Arpaci-Dusseau,et al.  Pipeline and batch sharing in grid workloads , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[7]  Joel H. Saltz,et al.  Design of a framework for data-intensive wide-area applications , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[8]  I. Foster,et al.  The Physiology of the Grid , 2003 .

[9]  Paul Lu,et al.  Bridging Local and Wide Area Networks for Overlay Distributed File Systems , 2005, WORLDS.

[10]  Christopher James Pinchak PLACEHOLDER SCHEDULING FOR OVERLAY METACOMPUTING , 2003 .

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Mark Lee,et al.  The Trellis security infrastructure for overlay metacomputers and bridged distributed file systems , 2006, J. Parallel Distributed Comput..

[13]  Wednesday September,et al.  2007 International Conference on Parallel Processing , 2007 .

[14]  Jeff T. Linderoth,et al.  An enabling framework for master-worker applications on the Computational Grid , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[15]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[16]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.