Large-scale seismic signal analysis with Hadoop

In seismology, waveform cross correlation has been used for years to produce high-precision hypocenter locations and for sensitive detectors. Because correlated seismograms generally are found only at small hypocenter separation distances, correlation detectors have historically been reserved for spotlight purposes. However, many regions have been found to produce large numbers of correlated seismograms, and there is growing interest in building next-generation pipelines that employ correlation as a core part of their operation. In an effort to better understand the distribution and behavior of correlated seismic events, we have cross correlated a global dataset consisting of over 300 million seismograms. This was done using a conventional distributed cluster, and required 42 days. In anticipation of processing much larger datasets, we have re-architected the system to run as a series of MapReduce jobs on a Hadoop cluster. In doing so we achieved a factor of 19 performance increase on a test dataset. We found that fundamental algorithmic transformations were required to achieve the maximum performance increase. Whereas in the original IO-bound implementation, we went to great lengths to minimize IO, in the Hadoop implementation where IO is cheap, we were able to greatly increase the parallelism of our algorithms by performing a tiered series of very fine-grained (highly parallelizable) transformations on the data. Each of these MapReduce jobs required reading and writing large amounts of data. But, because IO is very fast, and because the fine-grained computations could be handled extremely quickly by the mappers, the net was a large performance gain. A global dataset of over 300 million waveforms has been cross correlated.The algorithms have been adapted to run as MapReduce jobs on a Hadoop cluster.Increased parallelism was required to make best use of mappers.IO was significantly increased but had little impact on performance.A factor of 19 speedup was achieved relative to initial implementation.

[1]  David B. Harris,et al.  An Autonomous System for Grouping Events in a Developing Aftershock Sequence , 2011 .

[2]  Paul G. Richards,et al.  On finding and using repeating seismic events in and near China , 2011 .

[3]  GhemawatSanjay,et al.  The Google file system , 2003 .

[4]  James B. Orlin,et al.  A Faster Algorithm for Finding the Minimum Cut in a Directed Graph , 1994, J. Algorithms.

[5]  Fred W. Klein,et al.  Deep fault plane geometry inferred from multiplet relative relocation beneath the south flank of Kilauea , 1994 .

[6]  N. Anstey Correlation techniques – a review , 1964 .

[7]  Michel Campillo,et al.  A study of the seismic noise from its long-range correlation properties , 2006 .

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  R. Geller,et al.  Four similar earthquakes in central California , 1980 .

[11]  D. Harris Subspace Detectors: Theory , 2006 .

[12]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[13]  William L. Ellsworth,et al.  Monitoring velocity variations in the crust using earthquake doublets: An application to the Calaveras Fault, California , 1984 .

[14]  Michel Campillo,et al.  High-Resolution Surface-Wave Tomography from Ambient Seismic Noise , 2005, Science.

[15]  P. Shearer,et al.  Southern California Hypocenter Relocation with Waveform Cross-Correlation, Part 1: Results Using the Double-Difference Method , 2005 .

[16]  Stephen D. Malone,et al.  High precision relative locations of earthquakes at Mount St. Helens, Washington , 1987 .

[17]  A. Rubin,et al.  Streaks of microearthquakes along creeping faults , 1999, Nature.

[18]  A. Paul,et al.  Long-Range Correlations in the Diffuse Seismic Coda , 2003, Science.

[19]  P. Richards,et al.  Repeating Seismic Events in China , 2004, Science.