Big Data Staging with MPI-IO for Interactive X-ray Science

New techniques in X-ray scattering science experiments produce large data sets that can require millions of high-performance processing hours per week of computation for analysis. In such applications, data is typically moved from X-ray detectors to a large parallel file system shared by all nodes of a peta scale supercomputer and then is read repeatedly as different science application tasks proceed. However, this straightforward implementation causes significant contention in the file system. We propose an alternative approach in which data is instead staged into and cached in compute node memory for extended periods, during which time various processing tasks may efficiently access it. We describe here such a big data staging framework, based on MPI-IO and the Swift parallel scripting language. We discuss a range of large-scale data management issues involved in X-ray scattering science and measure the performance benefits of the new staging framework for high-energy diffraction microscopy, an important emerging application in data-intensive X-ray scattering. We show that our framework accelerates scientific processing turnaround from three months to under 10 minutes, and that our I/O technique reduces input overheads by a factor of 5 on 8K Blue Gene/Q nodes.

[1]  Kevin Harms,et al.  Scalable Parallel I/O on a Blue Gene/Q Supercomputer Using Compression, Topology-Aware Data Aggregation, and Subfiling , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[2]  Ian T. Foster,et al.  Benchmarking cloud-based tagging services , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[3]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[4]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[5]  Jun Wang,et al.  MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns , 2010, HPDC '10.

[6]  Indranil Gupta,et al.  Breaking the MapReduce stage barrier , 2010, 2010 IEEE International Conference on Cluster Computing.

[7]  Douglas Thain,et al.  Chirp: a practical global filesystem for cluster and Grid computing , 2008, Journal of Grid Computing.

[8]  Ali Raza Butt,et al.  /scratch as a cache: rethinking HPC center scratch storage , 2009, ICS.

[9]  Ian T. Foster,et al.  Dataflow coordination of data-parallel tasks via MPI 3.0 , 2013, EuroMPI.

[10]  Daniel S. Katz,et al.  Swift/T: Large-Scale Application Composition via Distributed-Memory Dataflow Processing , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[11]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[12]  Panos Vassiliadis,et al.  A Survey of Extract-Transform-Load Technology , 2009, Int. J. Data Warehous. Min..

[13]  Zhao Zhang,et al.  Toward loosely coupled programming on petascale systems , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Stephen L. Scott,et al.  Constructing collaborative desktop storage caches for large scientific datasets , 2006, TOS.

[16]  M. Miller,et al.  High-energy diffraction microscopy at the advanced photon source , 2011 .

[17]  Yong Zhao,et al.  Many-task computing for grids and supercomputers , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.

[18]  Ian T. Foster,et al.  Compiler Techniques for Massively Scalable Implicit Task Parallelism , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Andrea C. Arpaci-Dusseau,et al.  Explicit Control in the Batch-Aware Distributed File System , 2004, NSDI.

[20]  S. E. Offerman,et al.  A fast methodology to determine the characteristics of thousands of grains using three‐dimensional X‐ray diffraction. II. Volume, centre‐of‐mass position, crystallographic orientation and strain state of grains , 2012 .

[21]  Alexander S. Szalay,et al.  Accelerating large-scale data exploration through data diffusion , 2008, DADC '08.

[22]  Daniel S. Katz,et al.  A Workflow-Aware Storage System: An Opportunity Study , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[23]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[24]  Ian T. Foster,et al.  Globus Online: Accelerating and Democratizing Science through Cloud-Based Services , 2011, IEEE Internet Computing.

[25]  Ian T. Foster,et al.  Language Features for Scalable Distributed-Memory Dataflow Computing , 2014, 2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing.

[26]  Ewing Lusk,et al.  More scalability, less pain : A simple programming model and its implementation for extreme computing. , 2010 .

[27]  Daniel S. Katz,et al.  Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications , 2012, SWEET '12.

[28]  Robert M. Suter,et al.  Forward modeling method for microstructure reconstruction using x-ray diffraction microscopy: Single-crystal verification , 2006 .

[29]  S. E. Offerman,et al.  A fast methodology to determine the characteristics of thousands of grains using three-dimensional X-ray diffraction. I. Overlapping diffraction peaks and parameters of the experimental setup , 2012 .

[30]  Daniel S. Katz,et al.  Design and analysis of data management in scalable parallel scripting , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.