Demonstrating Distributed Workflow Computing with a Federating Wide-Area File System

We have demonstrated the synergy of a wide-area SLASH2 file system [1] with remote bioinformatics workflows between Extreme Science and Engineering Discovery Environment [2] sites using the Galaxy Project's web-based platform [3] for reproducible data analysis. Wide-area Galaxy workflows were enabled by establishing a geographically-distributed SLASH2 instance between the Greenfield [4] system at Pittsburgh Supercomputing Center [5] and virtual machines incorporating storage within the Corral [6] file system at the Texas Advanced Computing Center [7]. Analysis tasks submitted through a single Galaxy instance seamlessly leverage data available from either site. In this paper, we explore the advantages of SLASH2 for enabling workflows from Galaxy Main [8].