Outlier ensembles: A robust method for damage detection and unsupervised feature extraction from high-dimensional data

Abstract Outlier ensembles are shown to provide a robust method for damage detection and dimension reduction via a wholly unsupervised framework. Most interestingly, when utilised for feature extraction, the proposed heuristic defines features that enable near-equivalent classification performance (95.85%) when compared to the features found (in previous work) through supervised techniques (97.39%) — specifically, a genetic algorithm. This is significant for practical applications of structural health monitoring, where labelled data are rarely available during data mining. Ensemble analysis is applied to practical examples of problematic engineering data; two case studies are presented in this work. Case study I illustrates how outlier ensembles can be used to expose outliers hidden within a dataset. Case study II demonstrates how ensembles can be utilised as a tool for robust outlier analysis and feature extraction in a noisy, high-dimensional feature-space.

[1]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[2]  P. Rousseeuw,et al.  Minimum volume ellipsoid , 2009 .

[3]  Bill Gregory,et al.  On using robust Mahalanobis distance estimations for feature discrimination in a damage detection scenario , 2019 .

[4]  Claudomiro Sales,et al.  Deep principal component analysis: An enhanced approach for structural damage identification , 2018, Structural Health Monitoring.

[5]  Keith Worden,et al.  Experimental validation of a structural health monitoring methodology: Part III. Damage location on an aircraft wing , 2003 .

[6]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[7]  Charu C. Aggarwal,et al.  Outlier Ensembles - An Introduction , 2017 .

[8]  M. Debruyne,et al.  Minimum covariance determinant , 2010 .

[9]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[10]  Harrison M. Wadsworth Handbook of Statistical Methods for Engineers and Scientists , 1990 .

[11]  Edwin Reynders,et al.  Output-only structural health monitoring in changing environmental conditions by means of nonlinear system identification , 2014 .

[12]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[13]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[14]  Keith Worden,et al.  EXPERIMENTAL VALIDATION OF A STRUCTURAL HEALTH MONITORING METHODOLOGY: PART I. NOVELTY DETECTION ON A LABORATORY STRUCTURE , 2003 .

[15]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[16]  Charu C. Aggarwal,et al.  Outlier ensembles: position paper , 2013, SKDD.

[17]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[18]  Peter Rousseeuw,et al.  Detecting Deviating Data Cells , 2016, Technometrics.

[19]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[20]  Keith Worden,et al.  Robust methods of inclusive outlier analysis for structural health monitoring , 2014 .

[21]  M. Jhun,et al.  Asymptotics for the minimum covariance determinant estimator , 1993 .

[22]  Mia Hubert,et al.  LIBRA: a MATLAB library for robust analysis , 2005 .

[23]  Lionel Tarassenko,et al.  Guide to Neural Computing Applications , 1998 .

[24]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[25]  Keith Worden,et al.  A Bayesian non-parametric clustering approach for semi-supervised Structural Health Monitoring , 2019, Mechanical Systems and Signal Processing.

[26]  Guido De Roeck,et al.  The state‐of‐the‐art of damage detection by vibration monitoring: the SIMCES experience , 2003 .

[27]  Keith Worden,et al.  Genetic optimisation of a neural damage locator , 2008 .

[28]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[29]  Arthur Zimek,et al.  Subsampling for efficient and effective unsupervised outlier detection ensembles , 2013, KDD.

[30]  Charles R. Farrar,et al.  Structural Health Monitoring: A Machine Learning Perspective , 2012 .

[31]  Guido De Roeck,et al.  One-year monitoring of the Z24-Bridge : environmental effects versus damage events , 2001 .

[32]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[33]  Christophe Croux,et al.  An easy way to increase the finite-sample efficiency of the resampled minimum volume ellipsoid estimator , 1997 .

[34]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[35]  Douglas M. Hawkins,et al.  Exact iterative computation of the robust multivariate minimum volume ellipsoid estimator , 1993 .

[36]  Keith Worden,et al.  EXPERIMENTAL VALIDATION OF A STRUCTURAL HEALTH MONITORING METHODOLOGY: PART II. NOVELTY DETECTION ON A GNAT AIRCRAFT , 2003 .

[37]  Vivekanand Gopalkrishnan,et al.  Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces , 2010, DASFAA.

[38]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[39]  David M. Rocke,et al.  The Distribution of Robust Distances , 2005 .

[40]  Keith Worden,et al.  DAMAGE DETECTION USING OUTLIER ANALYSIS , 2000 .

[41]  Arthur Zimek,et al.  Ensembles for unsupervised outlier detection: challenges and research questions a position paper , 2014, SKDD.

[42]  Keith Worden,et al.  Active learning for semi-supervised structural health monitoring , 2018, Journal of Sound and Vibration.