Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark

Continuous changes and the high calculation volume in network data distribution have made it more difficult to detect abnormal behaviors within and analyze data.  For this cause, large data solutions have gained important. With the advancement of internet technologies and the digital age, cyber-attacks have increased steadily. The k-Means clustering algorithm is one of the most widely used algorithms in the world of data mining.  Clustering algorithms are algorithms that automatically divide data into smaller clusters or sub-clusters. The algorithm places statistically similar records in the same group. In this article, we have used k-Means method from the Machine Learning libraries on Spark to determine whether the incoming network values are normal behavior. 400 thousand network data were used in this article. This data was obtained from KDD Cup 1999 Data. We have detected 10 abnormal behaviors from 400 thousand network data with k-means method.

[1]  Ramesh Govindan,et al.  ASTUTE: detecting a different class of traffic anomalies , 2010, SIGCOMM '10.

[2]  Kuai Xu,et al.  Internet Traffic Behavior Profiling for Network Security Monitoring , 2008, IEEE/ACM Transactions on Networking.

[3]  W. Yassin,et al.  Intrusion detection based on K-Means clustering and Naïve Bayes classification , 2011, 2011 7th International Conference on Information Technology in Asia.

[4]  Seref Sagiroglu,et al.  Big data analytics for network anomaly detection from netflow data , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[5]  Haluk Eren,et al.  Cancer detection in mammograms estimating feature weights via Kullback-Leibler measure , 2013, 2013 6th International Congress on Image and Signal Processing (CISP).

[6]  Qi Hua,et al.  Parallelizing K-Means-Based Clustering on Spark , 2016, 2016 International Conference on Advanced Cloud and Big Data (CBD).

[7]  Sevcan Aytac Korkmaz LBP Özelliklerine Dayanan Lokasyon Koruyan Projeksiyon (LPP) Boyut Azaltma Metodunun Farklı Sınıflandırıcılar Üzerindeki Performanslarının Karşılaştırılması , 2018 .

[8]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[9]  Fernando Silveira,et al.  URCA: Pulling out Anomalies by their Root Causes , 2010, 2010 Proceedings IEEE INFOCOM.

[10]  Sevcan Aytac Korkmaz,et al.  A new method based cancer detection in mammogram textures by finding feature weights and using Kullback–Leibler measure with kernel estimation , 2015 .

[11]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[12]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[13]  Seiichi Uchida,et al.  A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data , 2016, PloS one.

[14]  Ankita Sinha,et al.  A novel K-means based clustering algorithm for big data , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[15]  Mahmud Güngör,et al.  K-Ortalamalar Yöntemi ile Yıllık Yağışların Sınıflandırılması ve Homojen Bölgelerin Belirlenmesi , 2012 .

[16]  Hamidullah Binol,et al.  Recognition of the stomach cancer images with probabilistic HOG feature vector histograms by using HOG features , 2017, 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY).

[17]  Mustafa Poyraz,et al.  A New Method Based for Diagnosis of Breast Cancer Cells from Microscopic Images: DWEE—JHT , 2014, Journal of Medical Systems.

[18]  Philippe Owezarski,et al.  Hunting attacks in the dark: clustering and correlation analysis for unsupervised anomaly detection , 2015, Int. J. Netw. Manag..

[19]  S. A. Korkmaz DETECTING CELLS USING IMAGE SEGMENTATION OF THE CERVICAL CANCER IMAGES TAKEN FROM SCANNING ELECTRON MICROSCOPE , 2017 .

[20]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[21]  Kavé Salamatian,et al.  Combining filtering and statistical methods for anomaly detection , 2005, IMC '05.

[22]  Mustafa Poyraz,et al.  Least Square Support Vector Machine and Minumum Redundacy Maximum Relavance for Diagnosis of Breast Cancer from Breast Microscopic Images , 2015 .

[23]  Kensuke Fukuda,et al.  ADMIRE: Anomaly detection method using entropy-based PCA with three-step sketches , 2013, Comput. Commun..

[24]  Kensuke Fukuda,et al.  A Hough-transform-based anomaly detector with an adaptive time interval , 2011, SAC '11.

[25]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[26]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[27]  Sean Owen,et al.  Mahout in Action , 2011 .

[28]  R. Jha,et al.  Anomaly detection in network traffic using K-mean clustering , 2016, 2016 3rd International Conference on Recent Advances in Information Technology (RAIT).

[29]  Andrzej Lesniak,et al.  Space–time clustering of seismic events and hazard assessment in the Zabrze-Bielszowice coal mine, Poland , 2009 .

[30]  Philippe Owezarski,et al.  Automated Classification of Network Traffic Anomalies , 2009, SecureComm.

[31]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[32]  Abdullah Bal,et al.  New methods based on mRMR_LSSVM and mRMR_KNN for diagnosis of breast cancer from microscopic and mammography images of some patients , 2015 .

[33]  Kensuke Fukuda,et al.  Random projection and multiscale wavelet leader based anomaly detection and address identification in internet traffic , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Arif Gulten,et al.  Assessing Effects of Pre-Processing Mass Spectrometry Data on Classification Performance , 2008, European journal of mass spectrometry.

[35]  M. Anwar Ma'sum,et al.  Design of intelligent k-means based on spark for big data clustering , 2016, 2016 International Workshop on Big Data and Information Security (IWBIS).

[36]  Andreas Mauthe,et al.  Traffic anomaly diagnosis in Internet backbone networks: A survey , 2014, Comput. Networks.

[37]  Mustafa Poyraz,et al.  Diagnosis of breast cancer in light microscopic and mammographic images textures using relative entropy via kernel estimation , 2015, Medical & Biological Engineering & Computing.