Distributed File Streamer: A Framework for Distributed Application Data Coupling

File transfer is very common in a modern distributed computing environment. Protocols such as HTTP and FTP are designed for downloading or uploading files from/to servers. Some other tools such as 'secure copy' are used to transfer files among hosts securely. In this paper, the file transfer is considered in the context of connecting distributed applications, what is an output of a data producer on one node would be an input of a data consumer on another node. Intermediate files are used as a medium to connect workflow computational phases, which is a common paradigm used in grid environments. Distributed File Streamer a.k.a. DFS, as its name implies, uses data streaming to couple distributed applications. Instead of waiting for a producer application for output to transfer completely to the consumer node, DFS streams the data over the network directly to a consumer program, managing the data flow efficiently and providing a framework for partial file consumption. This paper describes the architecture of the DFS framework, gives its performance model analysis, and provides results demonstrating DFS advantages over the traditional way on several examples

[1]  G. Allen,et al.  Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[2]  Douglas Comer,et al.  Internetworking with TCP/IP vol III (2nd ed.): client-server programming and applications BSD socket version , 1993 .

[3]  Dror G. Feitelson,et al.  Mpi-io: a parallel file i/o interface for mpi , 1995 .

[4]  Mark S. Squillante,et al.  The impact of I/O on program behavior and parallel scheduling , 1998, SIGMETRICS '98/PERFORMANCE '98.

[5]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[6]  Marianne Winslett,et al.  Flexible and efficient parallel I/O for large-scale multi-component simulations , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[7]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[8]  Ian T. Foster,et al.  Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[9]  Michael A. Frumkin,et al.  NAS Grid Benchmarks: A Tool for Grid Space Exploration , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[10]  Ian T. Foster,et al.  Remote I/O: fast access to distant storage , 1997, IOPADS '97.

[11]  Karsten Schwan,et al.  SmartPointers: Personalized Scientific Data Portals In Your Hand , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[12]  Simon See,et al.  Benchmark performance on cluster grid with NGB , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[13]  James Demmel,et al.  A Data Broker for Distributed Computing Environments , 2001, International Conference on Computational Science.

[14]  Steven Tuecke,et al.  GridFTP: Protocol Extensions to FTP for the Grid , 2001 .

[15]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.