Trellis-SDP: A simple data-parallel programming interface

Some datasets and computing environments are inherently distributed. For example, image data may be gathered and stored at different locations. Although data parallelism is a well-known computational model, there are few programming systems that are both easy to program (for simple applications) and can work across administrative domains. We have designed and implemented a simple programming system, called Trellis-SDP, that facilitates the rapid development of data-intensive applications. Trellis-SDP is layered on top of the Trellis infrastructure, a software system for creating overlay metacomputers: user-level aggregations of computer systems. Trellis-SDP provides a master-worker programming framework where the worker components can run self-contained, new or existing binary applications. We describe two interface functions, namely trellis scan() and trellis gather(), and show how easy it is to get reasonable performance with simple data-parallel applications, such as Content Based Image Retrieval (CBIR) and Parallel Sorting by Regular Sampling (PSRS).

[1]  Paul Lu,et al.  Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers: Early Experiences , 2002, JSSPP.

[2]  Jeff T. Linderoth,et al.  An enabling framework for master-worker applications on the Computational Grid , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[3]  Jonathan Schaeffer,et al.  The Canadian Internetworked Scientific Supercomputer , 2003 .

[4]  Yang Wang,et al.  The Trellis Security Infrastructure: A Layered Approach to Overlay Metacomputers , 2004, HPCS.

[5]  Domenico Talia,et al.  A Grid Programming Primer , 2001 .

[6]  Hai Jin,et al.  Active Disks: Programming Model, Algorithms and Evaluation , 2002 .

[7]  Joel H. Saltz,et al.  Design of a framework for data-intensive wide-area applications , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[8]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[9]  Daniel J. Barrett,et al.  SSH, The Secure Shell: The Definitive Guide , 2001 .

[10]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[11]  Dhabaleswar K. Panda,et al.  Efficient collective communication on heterogeneous networks of workstations , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[12]  Jonathan Schaeffer,et al.  On the Versatility of Parallel Sorting by Regular Sampling , 1993, Parallel Comput..

[13]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[14]  Massimo Bernaschi,et al.  Collective communication operations: experimental results vs. theory , 1998, Concurr. Pract. Exp..

[15]  Garth A. Gibson,et al.  Active Disks: Remote Execution for Network-Attached Storage (CMU-CS-97-198) , 1997 .

[16]  Ian T. Foster,et al.  A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[17]  Paul Lu,et al.  User-level remote data access in overlay metacomputers , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[18]  Andrea C. Arpaci-Dusseau,et al.  The Architectural Implications of Pipeline and Batch Sharing in Scientific Workloads , 2003 .

[19]  Jonathan Foote,et al.  An overview of audio information retrieval , 1999, Multimedia Systems.

[20]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..