Outlier Detection: Applications And Techniques

Outliers once upon a time regarded as noisy data in statistics, has turned out to be an important problem which is being researched in diverse fields of research and application domains. Many outlier detection techniques have been developed specific to certain application domains, while some techniques are more generic. Some application domains are being researched in strict confidentiality such as research on crime and terrorist activities. The techniques and results of such techniques are not readily forthcoming. A number of surveys, research and review articles and books cover outlier detection techniques in machine learning and statistical domains individually in great details. In this paper we make an attempt to bring together various outlier detection techniques, in a structured and generic description. With this exercise, we hope to attain a better understanding of the different directions of research on outlier analysis for ourselves as well as for beginners in this research field who could then pick up the links to different areas of applications in details.

[1]  Yoshikiyo Kato,et al.  Fault Detection by Mining Association Rules from House-keeping Data , 2001 .

[2]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[3]  Douglas L. Reilly,et al.  Credit card fraud detection with a neural-network , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[4]  Symeon Papavassiliou,et al.  Network intrusion and fault detection: a statistical anomaly approach , 2002, IEEE Commun. Mag..

[5]  J. Hollmen,et al.  Residual generation and visualization for understanding novel process conditions , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[6]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[7]  Jianzhong Li,et al.  Unsupervised Outlier Detection in Sensor Networks Using Aggregation Tree , 2007, ADMA.

[8]  Stephen Jose Hanson,et al.  A Neural Network Autoassociator for Induction Motor Failure Prediction , 1995, NIPS.

[9]  Zheng Zhang,et al.  HIDE : a Hierarchical Network Intrusion Detection System Using Statistical Preprocessing and Neural Network Classification , 2001 .

[10]  Peng Ning,et al.  LAD: Localization anomaly detection for wireless sensor networks , 2006, J. Parallel Distributed Comput..

[11]  Simon J. Godsill,et al.  Detection of abrupt spectral changes using support vector machines an application to audio signal segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Bernd Freisleben,et al.  CARDWATCH: a neural network based database mining system for credit card fraud detection , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[13]  Wenjie Hu,et al.  Robust support vector machine with bullet hole image classification , 2002 .

[14]  P. Brockett,et al.  Using Kohonen's Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud , 1998 .

[15]  Symeon Papavassiliou,et al.  Hierarchical Anomaly Detection in Distributed Large-Scale Sensor Networks , 2006, 11th IEEE Symposium on Computers and Communications (ISCC'06).

[16]  S. Roberts Novelty detection using extreme value statistics , 1999 .

[17]  X. Shao,et al.  Simultaneous Wavelength Selection and Outlier Detection in Multivariate Regression of Near-Infrared Spectra , 2005, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[18]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.

[19]  Jim Austin,et al.  Novelty detection in airframe strain data , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[20]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[21]  Shawn Ostermann,et al.  Detecting Anomalous Network Traffic with Self-organizing Maps , 2003, RAID.

[22]  L. Baker,et al.  A Hierarchical Probabilistic Model for Novelty Detection in Text , 1999, NIPS 1999.

[23]  M. M. Moya,et al.  One-class classifier networks for target recognition applications , 1993 .

[24]  P. Protopapas,et al.  Finding outlier light curves in catalogues of periodic variable stars , 2005, astro-ph/0505495.

[25]  D. Janakiram,et al.  Outlier Detection in Wireless Sensor Networks using Bayesian Belief Networks , 2006, 2006 1st International Conference on Communication Systems Software & Middleware.

[26]  Ilya V. Kolmanovsky,et al.  Predictive energy management of a power-split hybrid electric vehicle , 2009, 2009 American Control Conference.

[27]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[28]  Reda Alhajj,et al.  A comprehensive survey of numeric and symbolic outlier mining techniques , 2006, Intell. Data Anal..

[29]  Dit-Yan Yeung,et al.  Parzen-window network intrusion detectors , 2002, Object recognition supported by user interaction for service robots.

[30]  Stephen R. Marsland,et al.  A tale of two filters-on-line novelty detection , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[31]  Vir V. Phoha,et al.  Internet Security Dictionary , 2002, Springer New York.

[32]  Kenji Yamanishi,et al.  Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner , 2001, KDD '01.

[33]  James Theiler,et al.  Resampling approach for anomaly detection in multispectral images , 2003, SPIE Defense + Commercial Sensing.

[34]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[35]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[36]  Stefan Jakubek,et al.  Fault-diagnosis using neural networks with ellipsoidal basis functions , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[37]  Eamonn J. Keogh,et al.  Finding the most unusual time series subsequence: algorithms and applications , 2006, Knowledge and Information Systems.

[38]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[39]  Mohamed A. El-Sharkawi,et al.  Elliptical novelty grouping for on-line short-turn detection of excited running rotors , 1999 .

[40]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[41]  A.N. Srivastava,et al.  Enabling the discovery of recurring anomalies in aerospace problem reports using high-dimensional clustering techniques , 2006, 2006 IEEE Aerospace Conference.

[42]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[43]  Takehisa Yairi,et al.  An approach to spacecraft anomaly detection problem using kernel feature space , 2005, KDD '05.

[44]  Alan S. Perelson,et al.  Self-nonself discrimination in a computer , 1994, Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy.

[45]  Martin Mueller,et al.  Self-aware services: using Bayesian networks for detecting anomalies in Internet-based services , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[46]  S. Muthukrishnan,et al.  Mining Deviants in a Time Series Database , 1999, VLDB.

[47]  Eamonn J. Keogh,et al.  Approximations to magic: finding unusual medical time series , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[48]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.

[49]  Steven K. Donoho,et al.  Early detection of insider trading in option markets , 2004, KDD.

[50]  Keith Worden,et al.  STRUCTURAL FAULT DETECTION USING A NOVELTY MEASURE , 1997 .

[51]  Dipankar Dasgupta,et al.  A comparison of negative and positive selection algorithms in novel pattern detection , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[52]  Paul A. Crook,et al.  A Robot Implementation of a Biologically Inspired Method for Novelty Detection , 2002 .

[53]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[54]  Christopher Krügel,et al.  Bayesian event classification for intrusion detection , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..

[55]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[56]  Sameer Singh,et al.  An approach to novelty detection applied to the classification of image regions , 2004, IEEE Transactions on Knowledge and Data Engineering.

[57]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[58]  Volker Tresp,et al.  Fraud detection in communication networks using neural and probabilistic methods , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[59]  R. Blender,et al.  Identification of cyclone‐track regimes in the North Atlantic , 1997 .

[60]  Hisashi Kashima,et al.  Eigenspace-based anomaly detection in computer systems , 2004, KDD.

[61]  Charles R. Farrar,et al.  Novelty detection under changing environmental conditions , 2001, SPIE Smart Structures and Materials + Nondestructive Evaluation and Health Monitoring.

[62]  Mikhail J. Atallah,et al.  Reliable detection of episodes in event sequences , 2004, Knowledge and Information Systems.

[63]  Stephen G. Eick,et al.  Visual Data Mining : Recognizing Telephone Calling , 1997 .

[64]  T. Brotherton,et al.  Classification and novelty detection using linear models and a class dependent-elliptical basis function neural network , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[65]  P. Helman,et al.  A formal framework for positive and negative detection schemes , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[66]  Vipin Kumar,et al.  Parallel and Distributed Computing for Cybersecurity , 2005, IEEE Distributed Syst. Online.

[67]  Haimonti Dutta,et al.  Distributed Top-K Outlier Detection from Astronomy Catalogs using the DEMAC System , 2007, SDM.

[68]  H. Javitz,et al.  Detecting Unusual Program Behavior Using the Statistical Component of the Next-generation Intrusion Detection Expert System ( NIDES ) 1 , 1997 .

[69]  Martti Juhola,et al.  Informal identification of outliers in medical data , 2000 .

[70]  Lucas C. Parra,et al.  Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps , 1996, Neural Computation.

[71]  Keith Worden,et al.  Long-term stability of normal condition data for novelty detection , 2000, Smart Structures.

[72]  Florian Metze,et al.  Generalized radial basis function networks for classification and novelty detection: self-organization of optimal Bayesian decision , 2000, Neural Networks.

[73]  F. Y. Edgeworth,et al.  XLI. On discordant observations , 1887 .

[74]  Cecilia Surace,et al.  A Novelty Detection Approach to Diagnose Damage in a Cracked Beam , 1997 .

[75]  Colin Campbell,et al.  A Linear Programming Approach to Novelty Detection , 2000, NIPS.

[76]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[77]  Sanjay Chawla,et al.  On local spatial outliers , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[78]  Alfonso Valdes,et al.  Next Generation Intrusion Detection Expert System (NIDES), Software Users Manual , 1994 .

[79]  Tao Guo,et al.  Neural data mining for credit card fraud detection , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[80]  Stephen R. Marsland,et al.  A Real-Time Novelty Detector for a Mobile Robot , 2000, ArXiv.

[81]  Alfonso Valdes,et al.  Adaptive, Model-Based Monitoring for Cyber Attack Detection , 2000, Recent Advances in Intrusion Detection.

[82]  D. Hand,et al.  Unsupervised Profiling Methods for Fraud Detection , 2002 .

[83]  Gregory F Cooper,et al.  Conditional outlier detection for clinical alerting. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[84]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[85]  Ulrich Nehmzow,et al.  A Model of Habituation Applied to Mobile Robots , 2007 .

[86]  Padhraic Smyth,et al.  Adaptive event detection with time-varying poisson processes , 2006, KDD '06.

[87]  Lionel Tarassenko,et al.  The use of novelty detection techniques for monitoring high-integrity plant , 2002, Proceedings of the International Conference on Control Applications.

[88]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[89]  Ran Wolff,et al.  In-Network Outlier Detection in Wireless Sensor Networks , 2006, ICDCS.

[90]  Ashok N. Srivastava,et al.  Nonlinear gated experts for time series: discovering regimes and avoiding overfitting , 1995, Int. J. Neural Syst..

[91]  Ahmed H. Tewfik,et al.  Robust clustering of acoustic emission signals using the Kohonen network , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[92]  Stephen R. Marsland,et al.  Novelty Detection for Robot Neotaxis , 2000, ArXiv.

[93]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[94]  Mohamed A. El-Sharkawi,et al.  Detection of shorted-turns in the field winding of turbine-generator rotors using novelty detectors-development and field test , 1996 .

[95]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[96]  A. Raftery,et al.  Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes , 1998 .

[97]  Hongxing He,et al.  A comparative study of RNN for outlier detection in data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[98]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[99]  Lionel Tarassenko,et al.  Choosing an appropriate model for novelty detection , 1997 .

[100]  W. A. Hoyt,et al.  A function approximation approach to anomaly detection in propulsion system test data , 1993 .

[101]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[102]  Lionel Tarassenko,et al.  A System for the Analysis of Jet Engine Vibration Data , 1999, Integr. Comput. Aided Eng..

[103]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[104]  Peter G. Neumann,et al.  EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances , 1997, CCS 2002.

[105]  Sanjay Chawla,et al.  Mining for Outliers in Sequential Databases , 2006, SDM.

[106]  Ray J. Frank,et al.  The detection of fraud in mobile phone networks , 1996 .

[107]  Chang-Tien Lu,et al.  Spatial Weighted Outlier Detection , 2006, SDM.

[108]  Geoffrey G. Hazel,et al.  Multivariate Gaussian MRF for multispectral scene segmentation and anomaly detection , 2000, IEEE Trans. Geosci. Remote. Sens..

[109]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[110]  P. Sajda,et al.  Detection, synthesis and compression in mammographic image analysis with a hierarchical image probability model , 2001, Proceedings IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2001).

[111]  J. B. Hampshire,et al.  Real-time object classification and novelty detection for collaborative video surveillance , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[112]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[113]  Charu C. Aggarwal,et al.  On Abnormality Detection in Spuriously Populated Data Streams , 2005, SDM.

[114]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[115]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.

[116]  Raman K. Mehra,et al.  Detection and classification of intrusions and faults using sequences of system calls , 2001, SGMD.

[117]  Peter J. Nürnberg,et al.  Proceedings of the 27th Annual Hawaii International Conference on System Science , 1994 .

[118]  T. Brotherton,et al.  Anomaly detection for advanced military aircraft using neural networks , 2001, 2001 IEEE Aerospace Conference Proceedings (Cat. No.01TH8542).

[119]  P. S. Horn,et al.  Effect of outliers and nonhealthy individuals on reference interval estimation. , 2001, Clinical chemistry.

[120]  Debashis Ghosh,et al.  COPA - cancer outlier profile analysis , 2006, Bioinform..

[121]  Philip K. Chan,et al.  Learning Patterns from Unix Process Execution Traces for Intrusion Detection , 1997 .

[122]  Jim Austin,et al.  Novelty detection for strain-gauge degradation using maximally correlated components , 2002, ESANN.

[123]  M. F. Augusteijn,et al.  Neural network classification and novelty detection , 2002 .

[124]  Chang-Tien Lu,et al.  Algorithms for spatial outlier detection , 2003, Third IEEE International Conference on Data Mining.

[125]  M.M. Deris,et al.  A Comparative Study for Outlier Detection Techniques in Data Mining , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[126]  R. Tibshirani,et al.  Outlier sums for differential gene expression analysis. , 2007, Biostatistics.

[127]  Zengyou He,et al.  Mining Class Outliers: Concepts, Algorithms and Applications , 2004, WAIM.

[128]  Sushil Jajodia,et al.  ADAM: a testbed for exploring the use of data mining in intrusion detection , 2001, SGMD.

[129]  Dipankar Dasgupta,et al.  Anomaly detection in multidimensional data using negative selection algorithm , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[130]  R. Sekar,et al.  A fast automaton-based method for detecting anomalous program behaviors , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[131]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[132]  Cecilia Surace,et al.  A novelty detection method to diagnose damage in structures: An application to an offshore platform , 1998 .

[133]  Donald E. Brown,et al.  An Outlier-based Data Association Method for Linking Criminal Incidents , 2003, SDM.

[134]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[135]  Basil S. Maglaris,et al.  Towards multisensor data fusion for DoS detection , 2004, SAC '04.

[136]  Paul Helman,et al.  An immunological approach to change detection: algorithms, analysis and implications , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[137]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[138]  F. J. Anscombe,et al.  Rejection of Outliers , 1960 .

[139]  Einoshin Suzuki,et al.  Detecting interesting exceptions from medical test data with visual summarization , 2003, Third IEEE International Conference on Data Mining.

[140]  Michael J. Pont,et al.  Improving the performance of radial basis function classifiers in condition monitoring and fault diagnosis applications where 'unknown' faults may occur , 2002, Pattern Recognit. Lett..

[141]  Anup K. Ghosh,et al.  Detecting anomalous and unknown intrusions against programs , 1998, Proceedings 14th Annual Computer Security Applications Conference (Cat. No.98EX217).

[142]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[143]  K. Kadota,et al.  Detecting outlying samples in microarray data: A critical assessment of the effect of outliers on sample classification , 2003 .

[144]  Philip H. S. Torr,et al.  Outlier detection and motion segmentation , 1993, Other Conferences.

[145]  José R. Dorronsoro,et al.  Neural fraud detection in credit card operations , 1997, IEEE Trans. Neural Networks.

[146]  Spiros Papadimitriou,et al.  Computing Correlation Anomaly Scores Using Stochastic Nearest Neighbors , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[147]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[148]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[149]  Lionel Sacks,et al.  Active Platform Security through Intrusion Detection Using Naïve Bayesian Network for Anomaly Detection , 2002 .

[150]  Hugo Jair Escalante,et al.  A Comparison of Outlier Detection Algorithms for Machine Learning , 2005 .

[151]  Young-Koo Lee,et al.  An Anomaly Detection Algorithm for Detecting Attacks in Wireless Sensor Networks , 2006, ISI.

[152]  Cecilia Surace,et al.  A statistical approach to damage detection through vibration monitoring , 1997 .

[153]  M. J. Desforges,et al.  Applications of probability density estimation to the detection of abnormal conditions in engineering , 1998 .

[154]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[155]  H. E. Solberg,et al.  Detection of outliers in reference distributions: performance of Horn's algorithm. , 2005, Clinical chemistry.

[156]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1986, 1986 IEEE Symposium on Security and Privacy.

[157]  Fabio A. González,et al.  Anomaly Detection Using Real-Valued Negative Selection , 2003, Genetic Programming and Evolvable Machines.

[158]  Stephen L. Scott,et al.  Detecting Network Intrusion Using a Markov Modulated Nonhomogeneous Poisson Process , 2000 .

[159]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[160]  K. Worden,et al.  On the long-term stability of normal condition for damage detection in a composite panel , 2001 .

[161]  H. S. Teng,et al.  Adaptive real-time anomaly detection using inductively generated sequential patterns , 1990, Proceedings. 1990 IEEE Computer Society Symposium on Research in Security and Privacy.

[162]  Khaled Labib,et al.  NSOM: A Real-Time Network-Based Intrusion Detection System Using Self-Organizing Maps , 2002 .