An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection

The accuracy of traditional anomaly detection techniques implemented on full-dimensional spaces degrades significantly as dimensionality increases, thereby hampering many real-world applications. This work proposes an approach to selecting meaningful feature subspace and conducting anomaly detection in the corresponding subspace projection. The aim is to maintain the detection accuracy in high-dimensional circumstances. The suggested approach assesses the angle between all pairs of two lines for one specific anomaly candidate: the first line is connected by the relevant data point and the center of its adjacent points; the other line is one of the axis-parallel lines. Those dimensions which have a relatively small angle with the first line are then chosen to constitute the axis-parallel subspace for the candidate. Next, a normalized Mahalanobis distance is introduced to measure the local outlier-ness of an object in the subspace projection. To comprehensively compare the proposed algorithm with several existing anomaly detection techniques, we constructed artificial datasets with various high-dimensional settings and found the algorithm displayed superior accuracy. A further experiment on an industrial dataset demonstrated the applicability of the proposed algorithm in fault detection tasks and highlighted another of its merits, namely, to provide preliminary interpretation of abnormality through feature ordering in relevant subspaces.

[1]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Min Xie,et al.  A study of process monitoring based on inverse Gaussian distribution , 2014, Comput. Ind. Eng..

[3]  Chih-Chou Chiu,et al.  Process monitoring with ICA‐based signal extraction technique and CART approach , 2009, Qual. Reliab. Eng. Int..

[4]  Kwang-Ho Ro,et al.  Outlier detection for high-dimensional data , 2015 .

[5]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[6]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[7]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[8]  Rainer Göb Discussion of “Reliability Meets Big Data: Opportunities and Challenges” , 2014 .

[9]  Bokyoung Kang,et al.  Integrating independent component analysis and local outlier factor for plant-wide process monitoring , 2011 .

[10]  Mohamed S. Kamel,et al.  Finding Natural Clusters Using Multi-clusterer Combiner Based on Shared Nearest Neighbors , 2003, Multiple Classifier Systems.

[11]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[12]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[13]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[14]  Hans-Peter Kriegel,et al.  Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? , 2010, SSDBM.

[15]  In-Beum Lee,et al.  Fault detection and diagnosis based on modified independent component analysis , 2006 .

[16]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[17]  Eric Duviella,et al.  Supervision and prognosis architecture based on dynamical classification method for the predictive maintenance of dynamical evolving systems , 2015, Reliab. Eng. Syst. Saf..

[18]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[19]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[20]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[21]  Shengwei Wang,et al.  Pattern recognition-based chillers fault detection method using Support Vector Data Description (SVDD) , 2013 .

[22]  Ling Su,et al.  Research on Outlier Detection Algorithm for Evaluation of Battery System Safety , 2014 .

[23]  S. Qin,et al.  Improved nonlinear fault detection technique and statistical analysis , 2008 .

[24]  Pingfeng Wang,et al.  Failure diagnosis using deep belief learning based health state classification , 2013, Reliab. Eng. Syst. Saf..

[25]  Jianqing Fan,et al.  Distributions of angles in random packing on spheres , 2013, J. Mach. Learn. Res..

[26]  Mengling Wang,et al.  Dynamic process monitoring using adaptive local outlier factor , 2013 .

[27]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[28]  Douglas C. Montgomery Big Data and the Quality Profession , 2014, Qual. Reliab. Eng. Int..

[29]  Enrico Zio,et al.  Classifier-ensemble incremental-learning procedure for nuclear transient identification at different operational conditions , 2011, Reliab. Eng. Syst. Saf..

[30]  Enrico Zio,et al.  A support vector machine integrated system for the classification of operation anomalies in nuclear components and systems , 2007, Reliab. Eng. Syst. Saf..

[31]  Jionghua Jin,et al.  A survey on statistical methods for health care fraud detection , 2008, Health care management science.

[32]  George C. Runger,et al.  Fault detection for batch monitoring and discrete wavelet transforms , 2011, Qual. Reliab. Eng. Int..

[33]  Jong-Seok Lee,et al.  Shifting artificial data to detect system failures , 2015, Int. Trans. Oper. Res..

[34]  Zhiwei Gao,et al.  From Model, Signal to Knowledge: A Data-Driven Perspective of Fault Detection and Diagnosis , 2013, IEEE Transactions on Industrial Informatics.

[35]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[36]  Yili Hong,et al.  Reliability Meets Big Data: Opportunities and Challenges , 2014 .

[37]  Shengtong Zhong,et al.  A classification-based approach to monitoring the safety of dynamic systems , 2014, Reliab. Eng. Syst. Saf..

[38]  Maria E. Orlowska,et al.  Projected outlier detection in high-dimensional mixed-attributes data set , 2009, Expert Syst. Appl..

[39]  Martha W. Evens,et al.  Event storm detection and identification in communication systems , 2006, Reliab. Eng. Syst. Saf..

[40]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.