A model for optimizing file access patterns using spatio-temporal parallelism

For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible file access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.

[1]  Kenneth I. Joy,et al.  Evaluating the benefits of an extended memory hierarchy for parallel streamline algorithms , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[2]  Michael L. Norman,et al.  Accelerating data-intensive science with Gordon and Dash , 2010 .

[3]  Jun Wang,et al.  VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distributed I/O Systems , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[4]  Thomas Maxwell,et al.  The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): Data Analysis and Visualization for Geoscience Data , 2013 .

[5]  Ken Martin,et al.  Time Dependent Processing in a Parallel Pipeline Architecture , 2007, IEEE Transactions on Visualization and Computer Graphics.

[6]  Robert B. Ross,et al.  Scalable parallel building blocks for custom data analysis , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[7]  Prabhat,et al.  Extreme Scaling of Production Visualization Software on Diverse Architectures , 2010, IEEE Computer Graphics and Applications.

[8]  Robert Latham,et al.  Toward a General I/O Layer for Parallel-Visualization Applications , 2011, IEEE Computer Graphics and Applications.

[9]  James P. Ahrens,et al.  Revisiting wavelet compression for large-scale climate data using JPEG 2000 and ensuring data precision , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[10]  Jeremy S. Meredith,et al.  Parallel in situ coupling of simulation with a fully featured visualization system , 2011, EGPGV '11.

[11]  Surendra Byna,et al.  TECA: A Parallel Toolkit for Extreme Climate Analysis , 2012, ICCS.

[12]  Kwan-Liu Ma,et al.  A study of I/O methods for parallel visualization of large-scale data , 2005, Parallel Comput..

[13]  John M. Dennis,et al.  Parallel high-resolution climate data analysis using swift , 2011, MTAGS '11.

[14]  Kwan-Liu Ma,et al.  A Parallel Visualization Pipeline for Terascale Earthquake Simulations , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[15]  Michael E. Papka,et al.  Toward simulation-time data analysis and I/O acceleration on leadership-class systems , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[16]  Kenneth Moreland,et al.  Sandia National Laboratories , 2000 .