Peer-to-Peer Data Sharing for Scientific Workflows on Amazon EC2

In this paper, we consider the problem of data sharing in scientific workflows running on the cloud. We present the design and evaluation of a peer-to-peer approach to help solve this problem. We compare the performance of our peer-to-peer file manager with that of two network file systems for storing data for a typical data-intensive workflow application. Our results show that while our peer-to-peer file manager performs significantly better than one of the network file systems tested, it does not perform as well as the other. Finally, we discuss the various issues that might have affected the performance of our peer-to-peer file manager.

[1]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[2]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[3]  Borja Sotomayor,et al.  Virtual Clusters for Grid Communities , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[4]  G. Bruce Berriman,et al.  Data Sharing Options for Scientific Workflows on Amazon EC2 , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[6]  David E. Smith,et al.  Integrating Policy with Scientific Workflow Management for Data-Intensive Applications , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[7]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[8]  Ann L. Chervenak,et al.  Data Management Challenges of Data-Intensive Scientific Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[9]  Bora Uçar,et al.  Integrated data placement and task assignment for scientific workflows in clouds , 2011, DIDC '11.

[10]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[11]  David E. Irwin,et al.  Dynamic virtual clusters in a grid site manager , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[12]  Yun Yang,et al.  SwinDeW-a p2p-based decentralized workflow management system , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[13]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[14]  Jano I. van Hemert,et al.  Eliminating the middleman: peer-to-peer dataflow , 2008, HPDC '08.

[15]  Daniel S. Katz,et al.  A comparison of two methods for building astronomical image mosaics on a grid , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[16]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.