An Anomaly Detection Based on Optimization

At present, an anomaly detection is one of the important problems in many fields. The rapid growth of data volumes requires the availability of a tool for data processing and analysis of a wide variety of data types. The methods for anomaly detection are designed to detect object‘s deviations from normal behavior. However, it is difficult to select one tool for all types of anomalies due to the increasing computational complexity and the nature of the data. In this paper, an improved optimization approach for a previously known number of clusters, where a weight is assigned to each data point, is proposed. The aim of this article is to show that weighting of each data point improves the clustering solution. The experimental results on three datasets show that the proposed algorithm detects anomalies more accurately. It was compared to the k-means algorithm. The quality of the clustering result was estimated using clustering evaluation metrics. This research shows that the proposed method works better than k-means on the Australia (credit card applications) dataset according to the Purity, Mirkin and F-measure metrics, and on the heart diseases dataset according to F-measure and variation of information metric.

[1]  Jeffrey Xu Yu,et al.  Detection of Shape Anomalies: A Probabilistic Approach Using Hidden Markov Models , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Sudhir Kumar Sharma,et al.  Analysis of KDD Dataset Attributes - Class wise for Intrusion Detection , 2015 .

[3]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[4]  T. Velmurugan,et al.  A Survey of Partition based Clustering Algorithms in Data Mining: An Experimental Approach , 2011 .

[5]  Mei Bai,et al.  An efficient algorithm for distributed density-based outlier detection on big data , 2016, Neurocomputing.

[6]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[7]  José Antonio Lozano,et al.  An efficient approximation to the K-means clustering for massive data , 2017, Knowl. Based Syst..

[8]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[9]  Rasim M. Alguliyev,et al.  The Obstacles in Big Data Process , 2017 .

[10]  Marina Meila,et al.  Comparing subspace clusterings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Rasim M. Alguliyev,et al.  The Obstacles in Big Data Process , 2017 .

[12]  Ramiz M. Aliguliyev,et al.  Weighted Consensus Index for Assessment of The Scientific Performance of Researchers , 2014 .

[13]  Swagatam Das,et al.  Categorical fuzzy k-modes clustering with automated feature weight learning , 2015, Neurocomputing.

[14]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[15]  Ricardo J. G. B. Campello,et al.  Comparison of distributed evolutionary k-means clustering algorithms , 2015, Neurocomputing.

[16]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jugal K. Kalita,et al.  A multi-step outlier-based anomaly detection approach to network-wide traffic , 2016, Inf. Sci..

[18]  Adil Bagirov,et al.  Batch clustering algorithm for big data sets , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[19]  Feng Jiang,et al.  Initialization of K-modes clustering using outlier detection techniques , 2016, Inf. Sci..

[20]  Xiangliang Zhang,et al.  Autonomic intrusion detection: Adaptively detecting anomalies over unlabeled audit data streams in computer networks , 2014, Knowl. Based Syst..

[21]  F. Boutin,et al.  Cluster validity indices for graph partitioning , 2004 .

[22]  Jos van Hillegersberg,et al.  Outlier detection in healthcare fraud: A case study in the Medicaid dental domain , 2016, Int. J. Account. Inf. Syst..

[23]  Biming Tian,et al.  Anomaly detection in wireless sensor networks: A survey , 2011, J. Netw. Comput. Appl..

[24]  Alfredo De Santis,et al.  Network anomaly detection with the restricted Boltzmann machine , 2013, Neurocomputing.

[25]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[26]  ZhiWu Li,et al.  Anomaly detection based on a dynamic Markov model , 2017, Information Sciences.

[27]  Pavlos Protopapas,et al.  Finding anomalous periodic time series , 2009, Machine Learning.

[28]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[29]  Sridha Sridharan,et al.  An Evaluation of Different Features and Learning Models for Anomalous Event Detection , 2013, 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[30]  Wesam M. Ashour,et al.  Efficient Data Clustering Algorithms: Improvements over Kmeans , 2013 .

[31]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[32]  Manish Verma,et al.  A Comparative Study of Various Clustering Algorithms in Data Mining , 2012 .

[33]  Luís Nunes,et al.  Human Activity Recognition and Prediction , 2015 .

[34]  J. Alberto Espinosa,et al.  Big Data: Issues and Challenges Moving Forward , 2013, 2013 46th Hawaii International Conference on System Sciences.

[35]  Jiye Liang,et al.  A cluster centers initialization method for clustering categorical data , 2012, Expert Syst. Appl..

[36]  John D. Kelleher,et al.  Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies , 2015 .

[37]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[38]  V. K. Bhuvaneswari,et al.  A Comparative Study of Various Clustering Algorithms in Data Mining , 2014 .

[39]  Hamidreza Amindavar,et al.  A dynamic modeling approach for anomaly detection using stochastic differential equations , 2016, Digit. Signal Process..

[40]  Ravneet Kaur,et al.  A survey of data mining and social network analysis based anomaly detection techniques , 2016 .

[41]  Ramiz M. Aliguliyev,et al.  Performance evaluation of density-based clustering methods , 2009, Inf. Sci..

[42]  Wesam M. Ashour,et al.  Efficient and Fast Initialization Algorithm for K- means Clustering , 2012 .

[43]  Julien Ugon,et al.  Classes and clusters in data analysis , 2006, Eur. J. Oper. Res..

[44]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[45]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[46]  Andrew J. Clark,et al.  Data preprocessing for anomaly based network intrusion detection: A review , 2011, Comput. Secur..