Earthquake Fingerprints: Extracting Waveform Features for Similarity-Based Earthquake Detection

Seismologists are increasingly adopting data mining and machine learning techniques to detect weak earthquake signals in large seismic data sets. The detection performance of these new methods, especially their sensitivity and false detection rate, depends on the choice of feature representation for waveform data. We have previously introduced Fingerprint and Similarity Thresholding (FAST), a new method for waveform-similarity-based earthquake detection that uses a pattern mining approach to detect earthquake signals without template waveforms. FAST has two key steps: fingerprint extraction and efficient indexing for similarity search. In this work, we focus on FAST fingerprint extraction: the method used to map short-duration waveforms to a set of features, called waveform fingerprints, used for detection. We describe the FAST fingerprint extraction method, a data-adaptive variation on the Waveprint audio fingerprinting method tailored for use in continuous seismic data. We compare the performance of the FAST fingerprint extraction method with existing fingerprinting techniques designed for audio identification. To overcome the challenges associated with using limited or incomplete event catalogs to evaluate detection algorithms, we propose a framework for quantifying the performance of different fingerprint extraction methods in the context of blind similarity-based detection. Our framework uses computational experiments on benchmark data sets, constructed with known event waveforms, to compute a measure of fingerprint effectiveness. We use this framework to show that, among the audio fingerprinting schemes considered in this work, our proposed FAST fingerprint extraction method achieves the most consistent performance in distinguishing similar, low signal-to-noise earthquake waveforms from noise in waveform data sets from eight stations in the Northern California Seismic Network.

[1]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[2]  Lion Krischer,et al.  ObsPy: A Python toolbox for Seismology, a Data Center Perspective , 2010 .

[3]  Jure Leskovec,et al.  Mining of Massive Datasets, 2nd Ed , 2014 .

[4]  Joan Claudi Socoró,et al.  A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds , 2016 .

[5]  Gregory C. Beroza,et al.  Detecting earthquakes over a seismic network using single-station similarity measures , 2018 .

[6]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[7]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[8]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[9]  Arthur Paté,et al.  Machine learning reveals cyclic changes in seismic source spectra in Geysers geothermal field , 2018, Science Advances.

[10]  Gregory C. Beroza,et al.  Scalable Similarity Search in Seismology: A New Approach to Large-Scale Earthquake Detection , 2016, SISAP.

[11]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[12]  Clara E Yoon,et al.  Earthquake detection through computationally efficient similarity search , 2015, Science Advances.

[13]  Shumeet Baluja,et al.  Waveprint: Efficient wavelet-based audio fingerprinting , 2008, Pattern Recognit..

[14]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[15]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[16]  Philip Levis,et al.  Locality-Sensitive Hashing for Earthquake Detection: A Case Study Scaling Data-Driven Science , 2018, Proc. VLDB Endow..

[17]  Michaël Gharbi,et al.  Convolutional neural network for earthquake detection and location , 2017, Science Advances.

[18]  Sharath Pankanti,et al.  On the Individuality of Fingerprints , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[20]  J. Díaz,et al.  Urban Seismology: on the origin of earth vibrations within a city , 2017, Scientific Reports.

[21]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  A. Broder Some applications of Rabin’s fingerprinting method , 1993 .

[23]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[24]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[25]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  R. V. Allen,et al.  Automatic phase pickers: Their present use and future prospects , 1982 .

[27]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[28]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[29]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[30]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[31]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[32]  F. Ringdal,et al.  The detection of low magnitude seismic events using array-based waveform correlation , 2006 .

[33]  Y. Ben‐Zion,et al.  Characteristics of Airplanes and Helicopters Recorded by a Dense Seismic Array Near Anza California , 2018, Journal of Geophysical Research: Solid Earth.

[34]  Lion Krischer,et al.  ObsPy: A Python Toolbox for Seismology , 2010 .

[35]  Andrew P. Valentine,et al.  Data space reduction, quality assessment and searching of seismograms: autoencoder networks for waveform data , 2012 .

[36]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[37]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[38]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[39]  Pedro Cano,et al.  A Review of Audio Fingerprinting , 2005, J. VLSI Signal Process..

[40]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[41]  Avery Wang,et al.  The Shazam music recognition service , 2006, CACM.

[42]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .