OFS-NN: Optimal Features-Neural Network Based Outlier Detection for Big Data Analysis

Outlier detection is massively consist of large number of application domain. Previously the outlier is known as noisy data, but in recent days it became more important in various field because of its usage to detect the unpredicted and unidentified. The outlier detection are used in certain areas like fraud detection of credit cards, calling cards, discovering computer intrusion and criminal behaviors etc. Aim: the main motive of this research is to propose feature selection and subset based outlier detection. Methods: This paper proposes Optimal Feature selection based Neural Network (OFS-NN) an effectual outlier detection approach accompanied with prior feature optimization strategy. Initially, preprocessing stage formats all data instances available in the dataset utilized and deployed in a SPARK architecture. The datasets are preprocessed and divided into subsets. Furthermore, an Artificial Bee Colony Optimization gets employed in determining for an optimal set of features among the wholesome feature set available. Artificial Bee Colony that exclude outliers on the basis of a feature set. Moreover, an Expectation maximization clustering approach involves in clustering maximum similar data. As a final point, the neural network classification is used for outlier detection. Results: The efficacy of OFS-NN regarding outlier detection gets exemplified by evaluating Area Under Curve (AUC), CPU utilization time, execution time, detection accuracy and memory consumption against existing outlier detection methodologies. OFS-NN evidently proves to be efficacious than other approaches in terms of mitigated execution time under minimum and maximum dataset size. 

[1]  Fiorella Lauro,et al.  Fault detection analysis using data mining techniques for a cluster of smart office buildings , 2015, Expert Syst. Appl..

[2]  Sankar K. Pal,et al.  Rough Sets, Kernel Set, and Spatiotemporal Outlier Detection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  Philip S. Yu,et al.  An Efficient Approach for Outlier Detection with Imperfect Data Labels , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Lei Cao,et al.  Detecting moving object outliers in massive-scale trajectory streams , 2014, KDD.

[5]  Haibo He,et al.  Spatial outlier detection based on iterative self-organizing learning model , 2013, Neurocomputing.

[6]  Feiping Nie,et al.  Efficient semi-supervised feature selection with noise insensitive trace ratio criterion , 2013, Neurocomputing.

[7]  Chih-Fong Tsai,et al.  CANN: An intrusion detection system based on combining cluster centers and nearest neighbors , 2015, Knowl. Based Syst..

[8]  B. Muthukumar,et al.  Intrusion Detection System (IDS): Anomaly Detection Using Outlier Detection Approach , 2015 .

[9]  Longbing Cao,et al.  SVDD-based outlier detection on uncertain data , 2012, Knowledge and Information Systems.

[10]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[11]  Abeer Badr El Din Ahmed,et al.  Data Mining: A prediction for Student's Performance Using Classification Method , 2014 .

[12]  Mohammad Khubeb Siddiqui,et al.  Analysis of KDD CUP 99 Dataset using Clustering based Data Mining , 2013 .

[13]  Johanna Völker,et al.  Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection , 2014, SEMWEB.

[14]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Leandro Nunes de Castro,et al.  Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods , 2015, Inf. Sci..

[17]  Back Thomas,et al.  Local subspace-based outlier detection using global neighbourhoods , 2016 .

[18]  Yannis Manolopoulos,et al.  Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms , 2013, SIGMOD '13.

[19]  Emmanuel Müller,et al.  Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[20]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[21]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[22]  Ruggero G. Pensa,et al.  A Semisupervised Approach to the Detection and Characterization of Outliers in Categorical Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Arthur Zimek,et al.  Discriminative features for identifying and interpreting outliers , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[24]  Arthur Zimek,et al.  Subsampling for efficient and effective unsupervised outlier detection ensembles , 2013, KDD.

[25]  Sukumar Nandi,et al.  An Outlier Detection Method Based on Clustering , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.