The application of cloud computing to the creation of image mosaics and management of their provenance

We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management technologies. We will present a detailed comparison of the performance of Montage on the cloud and on the Abe high performance cluster at the National Center for Supercomputing Applications (NCSA). Because Montage generates many intermediate products, we have used it to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with provenance management technologies such as the "Provenance Aware Service Oriented Architecture" (PASOA).

[1]  Arie Shoshani,et al.  Scientific Data Management - Challenges, Technology, and Deployment , 2009, Scientific Data Management.

[2]  Anthony J. G. Hey,et al.  Jim Gray on eScience: a transformed scientific method , 2009, The Fourth Paradigm.

[3]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[4]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.

[5]  Albert J. Fleig,et al.  Provenance Tracking in an Earth Science Data Processing System , 2008, IPAW.

[6]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[7]  Marlon Pierce,et al.  Cyberinfrastructure Software Sustainability and Reusability: Report from an NSF-funded workshop held 27 & 28 March 2009 , 2010 .

[8]  Michael McCann,et al.  Oceanographic Data Provenance Tracking with the Shore Side Data System , 2008, IPAW.

[9]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[10]  Constantinos Evangelinos,et al.  Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere- , 2008 .

[11]  Paul T. Groth,et al.  Extracting causal graphs from an open provenance data model , 2008, Concurr. Comput. Pract. Exp..

[12]  D. Kleppner Ensuring the integrity, accessibility, and stewardship of research data in the digital age , 2010 .

[13]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[14]  G. Bruce Berriman,et al.  Scientific workflow applications on Amazon EC2 , 2010, 2009 5th IEEE International Conference on E-Science Workshops.

[15]  Ian T. Foster Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, NPC.

[16]  Paul T. Groth,et al.  Pipeline-centric provenance model , 2009, WORKS '09.