HIDStore: A Hierarchical Intermediate Data Storage System for Seismic Processing Application

Seismic processing is an important technology in petroleum industry. During the execution of seismic processing applications, large amount of intermediate data are generated and accessed. Providing high-performance services for intermediate data in the traditional storage architecture is expensive. In addition, because of the existence of new storage devices, the heterogeneity of storage environment has brought much inconvenience to the application developers and petroleum scientists. In this paper, we present a hierarchical intermediate data storage system called HIDStore. HIDStore employs distributed storage system based on the local storage devices and idle network resources to accelerate intermediate data access. Our experiments show that using HIDStore could improve the performance of various seismic processing applications and the resource utilization in compute cluster. HIDStore also abstracts different kinds of storage devices into hierarchical logical volumes and provides easy-to-use API to access data. Developers could deal with intermediate data in a high level of abstraction. Applications based on the HIDStore could fit into different storage environment and gain optimal performance automatically. Intermediate data in HIDStore could be automatically evicted once they expire.

[1]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[2]  Indranil Gupta,et al.  Making cloud intermediate data fault-tolerant , 2010, SoCC '10.

[3]  Douglas A. Dodge,et al.  Large-scale seismic signal analysis with Hadoop , 2014, Comput. Geosci..

[4]  Hai Jin,et al.  SSDUP: a traffic-aware ssd burst buffer for HPC systems , 2017, ICS '17.

[5]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[6]  Lei Huang,et al.  Is Apache Spark scalable to seismic data analytics and computations? , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[7]  Changhai Zhao,et al.  Parallel Kirchhoff Pre-Stack Depth Migration on Large High Performance Clusters , 2015, ICA3PP.

[8]  Xiao Liu,et al.  A cost-effective strategy for intermediate data storage in scientific cloud workflow systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[9]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[10]  Abhishek Srivastava,et al.  3D Kirchhoff depth migration algorithm: A new scalable approach for parallelization on multicore CPU based cluster , 2017, Comput. Geosci..

[11]  Douglas A. Dodge,et al.  Large-scale seismic waveform quality metric calculation using Hadoop , 2016, Comput. Geosci..

[12]  Shadi Ibrahim,et al.  Eley: On the Effectiveness of Burst Buffers for Big Data Processing in HPC Systems , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[13]  Wang Teng,et al.  An Ephemeral Burst-Buffer File System for Scientific Applications , 2016 .

[14]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[15]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[16]  Jinjun Chen,et al.  An Upper-Bound Control Approach for Cost-Effective Privacy Protection of Intermediate Dataset Storage in Cloud , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.