An Adaptable Seismic Data Format

We present ASDF, the Adaptable Seismic Data Format, a modern and practical data format for all branches of seismology and beyond. The growing volume of freely available data coupled with ever expanding computational power opens avenues to tackle larger and more complex problems. Current bottlenecks include inefficient resource usage and insufficient data organization. Properly scaling a problem requires the resolution of both these challenges, and existing data formats are no longer up to the task. ASDF stores any number of synthetic, processed or unaltered waveforms in a single file. A key improvement compared to existing formats is the inclusion of comprehensive meta information, such as event or station information, in the same file. Additionally, it is also usable for any non-waveform data, for example, cross-correlations, adjoint sources or receiver functions. Last but not least, full provenance information can be stored alongside each item of data, thereby enhancing reproducibility and accountability. Any data set in our proposed format is self-describing and can be readily exchanged with others, facilitating collaboration. The utilization of the HDF5 container format grants efficient and parallel I/O operations, integrated compression algorithms and check sums to guard against data corruption. To not reinvent the wheel and to build upon past developments, we use existing standards like QuakeML, StationXML, W3C PROV and HDF5 wherever feasible. Usability and tool support are crucial for any new format to gain acceptance. We developed mature C/Fortran and Python based APIs coupling ASDF to the widely used SPECFEM3D_GLOBE and ObsPy toolkits.

[1]  Lion Krischer,et al.  ObsPy: A Python Toolbox for Seismology , 2010 .

[2]  Carl Tape,et al.  Seismic tomography of the southern California crust based on spectral‐element and adjoint methods , 2010 .

[3]  D. Komatitsch,et al.  Spectral-element simulations of global seismic wave propagation—I. Validation , 2002 .

[4]  Marco Cattaneo,et al.  Update of the Computing Models of the WLCG and the LHC Experiments , 2014 .

[5]  Daniel E. McNamara,et al.  Ambient Noise Levels in the Continental United States , 2004 .

[6]  P. Bormann,et al.  New Manual of Seismological Observatory Practice (NMSOP-2) , 2012 .

[7]  Jens Havskov,et al.  Routine Data Processing in Earthquake Seismology: With Sample Data, Exercises and Software , 2010 .

[8]  Lion Krischer,et al.  ObsPy – What can it do for data centers and observatories? , 2011 .

[9]  Russ Rew,et al.  NetCDF: an interface for scientific data access , 1990, IEEE Computer Graphics and Applications.

[10]  Qinya Liu,et al.  Tomography, Adjoint Methods, Time-Reversal, and Banana-Doughnut Kernels , 2004 .

[11]  Jens Havskov,et al.  Routine data processing in earthquake seismology , 2010 .

[12]  D. Komatitsch,et al.  Spectral-element simulations of global seismic wave propagation: II. Three-dimensional models, oceans, rotation and self-gravitation , 2002 .

[13]  George Helffrich,et al.  The Seismic Analysis Code by George Helffrich , 2013 .

[14]  Lion Krischer,et al.  ObsPy: a bridge for seismology into the scientific Python ecosystem , 2015 .

[15]  Andreas Fichtner,et al.  The adjoint method in seismology – I. Theory , 2006 .

[16]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[17]  George Helffrich,et al.  The Seismic Analysis Code: Acknowledgements , 2013 .

[18]  George Helffrich,et al.  The Seismic Analysis Code: A Primer and User's Guide , 2013 .

[19]  Danijel Schorlemmer,et al.  QuakeML: status of the XML-based seismological data exchange format , 2011 .

[20]  Arie Shoshani,et al.  Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks , 2014, Concurr. Comput. Pract. Exp..