Detection of Outliers in Geochemical Data Using Ensembles of Subsets of Variables

Geochemical data used in geological interpretation of mine deposits and identification of geological domains often contain outliers. Undertaking statistically sound and robust decision-making about outliers (such as deciding whether observations under consideration belong to a given domain) can be a challenging task. Traditional statistical procedures are often poorly suited to the noisy, intrinsically multivariate and high-dimensional nature of geochemical data. We present herein a novel approach for detecting outliers robustly in large multi-dimensional geochemical data. The approach incorporates a feature selection method that automatically seeks the best subset of chemical ratios that, together with the original chemical variables, best represent the inherent characteristics of the data. The proposed approach robustly distinguishes outliers even at high contamination levels. Experimental results demonstrating the advantages of the proposed feature selection algorithm over previous methods used in outlier detection are shown using data from an iron ore deposit located in the Brockman Iron Formation of Hamersley Province, Western Australia.

[1]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[2]  A. C. Rencher Methods of multivariate analysis , 1995 .

[3]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[4]  Peter Filzmoser,et al.  Robustness for Compositional Data , 2013 .

[5]  V. Pawlowsky-Glahn,et al.  Simplicial geometry for compositional data , 2006, Geological Society, London, Special Publications.

[6]  S. Hagemann,et al.  Banded Iron Formation-Related Iron Ore Deposits of the Hamersley Province, Western Australia , 2008 .

[7]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[8]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[9]  R. C. Morris A textural and mineralogical study of the relationship of iron ore to banded iron-formation in the Hamersley iron province of Western Australia , 1980 .

[10]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[11]  Clemens Reimann,et al.  Multivariate outlier detection in exploration geochemistry , 2005, Comput. Geosci..

[12]  Connie M. Borror,et al.  Methods of Multivariate Analysis, 2nd Ed. , 2004 .

[13]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[14]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[15]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[16]  L. Breiman Arcing Classifiers , 1998 .

[17]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  J. Clout Iron formation-hosted iron ores in the Hamersley Province of Western Australia , 2006 .

[19]  J RousseeuwPeter,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[20]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[21]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[22]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[23]  Keith Ord,et al.  Outliers in statistical data: V. Barnett and T. Lewis, 1994, 3rd edition, (John Wiley & Sons, Chichester), 584 pp., [UK pound]55.00, ISBN 0-471-93094-6 , 1996 .