Optimizing fastquery performance on lustre file system

FastQuery is a parallel indexing and querying system we developed for accelerating analysis and visualization of scientific data. We have applied it to a wide variety of HPC applications and demonstrated its capability and scalability using a petascale trillion-particle simulation in our previous work. Yet, through our experience, we found that performance of reading and writing data with FastQuery, like many other HPC applications, could be significantly affected by various tunable parameters throughout the parallel I/O stack. In this paper, we describe our success in tuning the performance of FastQuery on a Lustre parallel file system. We study and analyze the impact of parameters and tunable settings at file system, MPI-IO library, and HDF5 library levels of the I/O stack. We demonstrate that a combined optimization strategy is able to improve performance and I/O bandwidth of FastQuery significantly. In our tests with a trillion-particle dataset, the time to index the dataset reduced by more than one half.

[1]  Jeffrey S. Vetter,et al.  Performance characterization and optimization of parallel I/O on the Cray XT , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[2]  Kesheng Wu,et al.  FastBit: An Efficient Indexing Technology For Accelerating Data-Intensive Science , 2005 .

[3]  Arie Shoshani,et al.  Parallel I/O, analysis, and visualization of a trillion particle simulation , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  John Shalf,et al.  Tuning HDF5 for Lustre File Systems , 2010 .

[5]  Wei-keng Liao,et al.  Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Elizabeth O'Neil,et al.  Database--Principles, Programming, and Performance , 1994 .

[7]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[8]  Arie Shoshani,et al.  Scientific Data Management - Challenges, Technology, and Deployment , 2009, Scientific Data Management.

[9]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[10]  Kesheng Wu,et al.  FastQuery: A General Indexing and Querying System for Scientific Data , 2011, SSDBM.

[11]  Kesheng Wu,et al.  FastQuery: A Parallel Indexing System for Scientific Data , 2011, 2011 IEEE International Conference on Cluster Computing.

[12]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[13]  Surendra Byna,et al.  A framework for auto-tuning HDF5 applications , 2013, HPDC.

[14]  Marianne Winslett,et al.  Automatic parallel I/O performance optimization using genetic algorithms , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[15]  Arie Shoshani,et al.  Parallel in situ indexing for data-intensive computing , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[16]  Ben. Pontin,et al.  The IPCC fifth assessment report , 2013 .

[17]  Marianne Winslett,et al.  Automatic parallel I/O performance optimization in Panda , 1998, SPAA '98.

[18]  Arie Shoshani,et al.  Parallel index and query for large scale data analysis , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[19]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[20]  John Shalf,et al.  Query-driven visualization of large data sets , 2005, VIS 05. IEEE Visualization, 2005..

[21]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[22]  Marianne Winslett,et al.  Performance Modeling for the Panda Array I/O Library , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[23]  Jianwei Li,et al.  Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[24]  K. Bowers,et al.  Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulationa) , 2008 .

[25]  Jens Mache,et al.  The impact of spatial layout of jobs on I/O hotspots in mesh networks , 2005, J. Parallel Distributed Comput..