Outlier detection from a mixture distribution when training data are unlabeled

Abstract We consider the difficult task of using seismic signals (or any other discriminants) for detecting nuclear explosions from the large number of background signals such as earthquakes and mining blasts. Given a ground-truth database (i.e., labeled data), Fisk et al. (1996) consider the problem of detecting outliers (nuclear explosions) from a single background-signal population, and their approach has been applied successfully in several regions around the world. Wang et al. (1997) attack the problem in terms of modeling the background as a mixture distribution and looking for outliers (nuclear events) from that mixture. However, those authors only considered the case in which at least some fraction of the training sample was labeled, that is, at least some ground-truth information was available, and the number of distinct classes of events was known. In the current article, we extend these results to the case in which no events in the training sample are labeled and also to the case in which the number of event types represented in the training sample is unknown. One can view the mixture approach as a robust method for fitting a density to training data that may not be normally distributed whether or not the data consist of identifiable components that have a physical interpretation. The technique is demonstrated using simulated data as well as two sets of seismic data.

[1]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[2]  William R. Walter,et al.  Phase and spectral ratio discrimination between NTS earthquakes and explosions. Part I: Empirical observations , 1995 .

[3]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[4]  Jan Wüster,et al.  Discrimination of chemical explosions and earthquakes in central Europe—a case study , 1993 .

[5]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[6]  H. L. Gray,et al.  Testing for the maximum mean in a mixture of normals , 1989 .

[7]  Steven Bottone,et al.  Event Characterization Development and Analysis at the Prototype IDC , 2000 .

[8]  F. Ryall,et al.  CSS Ground-Truth Database: Version 1 Handbook. , 1993 .

[9]  Richard A. Redner,et al.  The Akaike information criterion and its application to mixture proportion estimation , 1982 .

[10]  Stanley L. Sclove,et al.  Application of the Conditional Population-Mixture Model to Image Segmentation , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  E. Hannan The Estimation of the Order of an ARMA Process , 1980 .

[12]  G. J. McLachlan,et al.  9 The classification and mixture maximum likelihood approaches to cluster analysis , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[13]  Fisk,et al.  Preliminary assessment of seismic CTBT/NPT monitoring capability , 1994 .

[14]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[15]  H. Akaike A new look at the statistical model identification , 1974 .

[16]  Stephan R. Sain,et al.  A New Test for Outlier Detection from a Multivariate Mixture Distribution , 1997 .

[17]  Hirotugu Akaike,et al.  On entropy maximization principle , 1977 .

[18]  G. R. Dargahi-Noubary STOCHASTIC MODELING AND IDENTIFICATION OF SEISMIC RECORDS BASED ON ESTABLISHED DETERMINISTIC FORMULATIONS , 1995 .

[19]  Steven R. Taylor,et al.  An evaluation of generalized likelihood Ratio Outlier Detection to identification of seismic events in Western China , 1996 .

[20]  D. Tjøstheim Some autoregressive models for short-period seismic noise , 1975, Bulletin of the Seismological Society of America.

[21]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[22]  William Scott Phillips,et al.  A preliminary study of regional seismic discrimination in central Asia with emphasis on western China , 1996, Bulletin of the Seismological Society of America.

[23]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[24]  H. Bozdogan,et al.  Multi-sample cluster analysis using Akaike's Information Criterion , 1984 .

[25]  Henry L. Gray,et al.  Regional event discrimination without transporting thresholds , 1996 .