Anomaly Detection for Industrial Big Data

As the Industrial Internet of Things (IIoT) grows, systems are increasingly being monitored by arrays of sensors returning time-series data at ever-increasing 'volume, velocity and variety' (i.e. Industrial Big Data). An obvious use for these data is real-time systems condition monitoring and prognostic time to failure analysis (remaining useful life, RUL). (e.g. See white papers by Senseye.io, and output of the NASA Prognostics Center of Excellence (PCoE).) However, as noted by Agrawal and Choudhary 'Our ability to collect "big data" has greatly surpassed our capability to analyze it, underscoring the emergence of the fourth paradigm of science, which is data-driven discovery.' In order to fully utilize the potential of Industrial Big Data we need data-driven techniques that operate at scales that process models cannot. Here we present a prototype technique for data-driven anomaly detection to operate at industrial scale. The method generalizes to application with almost any multivariate dataset based on independent ordinations of repeated (bootstrapped) partitions of the dataset and inspection of the joint distribution of ordinal distances.

[1]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[2]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[3]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[4]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[5]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[6]  Virgilio Gómez-Rubio,et al.  Spatial Point Patterns: Methodology and Applications with R , 2016 .

[7]  Michael Greenacre,et al.  Contribution Biplots , 2013 .

[8]  Donald A. Jackson STOPPING RULES IN PRINCIPAL COMPONENTS ANALYSIS: A COMPARISON OF HEURISTICAL AND STATISTICAL APPROACHES' , 1993 .

[9]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[10]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[11]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[12]  Michael Greenacre,et al.  Biplots in Practice , 2009 .

[13]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[14]  Donald A. Jackson,et al.  How many principal components? stopping rules for determining the number of non-trivial axes revisited , 2005, Comput. Stat. Data Anal..

[15]  Andrea De Mauro,et al.  What is big data? A consensual definition and a review of key research topics , 2015, AIP Conference Proceedings.