A Proposal of Scalable and Performing Implementation of Algorithms forAnomaly and Community Detection

This paper presents a study of various standard anomalies detection techniques in internet networks, using machine learning algorithms, classification algorithms, and graph techniques working on SQL/Mapreduce and SQL-Graph implementation platforms. This approach shows its efficiency for the study of large datasets. Firstly, several algorithmic approaches and possible implementations have been studied and experimentally tested to see how the SQL-MapReduce and SQL-Graph implementation can process large-scale data. Secondly the performance and scalability of the algorithms for large volumes of data have been compared to choose the most appropriate for typical anomaly detection. The application scope is very broad, such as spam detection, crime detection, mafia or terrorism community detection, network intrusion detection, malignant tumors detection in healthcare, fraud detection on banking transactions, identity theft detection etc. The main problem we will address on this paper is the processing performance on large scale data using an implementation of Massively Parallel Processing (MPP), SQL-MapReduce, and SQl-Graph.

[1]  Alberto Ferreira de Souza,et al.  Automatic large-scale data acquisition via crowdsourcing for crosswalk classification: A deep learning approach , 2017, Comput. Graph..

[2]  Shuichiro Yamamotoa,et al.  th International Conference on Knowledge Based and Intelligent Information and Engineering Systems , 2016 .

[3]  Pasquale De Meo,et al.  Mixing local and global information for community detection in large networks , 2013, J. Comput. Syst. Sci..

[4]  Satrio Baskoro Yudhoatmojo,et al.  Community Detection On Citation Network Of DBLP Data Sample Set Using LinkRank Algorithm , 2017 .

[5]  Michel Crampes,et al.  Survey on Social Community Detection , 2013, Social Media Retrieval.

[6]  John Cieslewicz,et al.  SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions , 2009, Proc. VLDB Endow..

[7]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  Jing Wang,et al.  Botnet Detection Based on Anomaly and Community Detection , 2017, IEEE Transactions on Control of Network Systems.

[10]  Herodotos Herodotou,et al.  Massively Parallel Databases and MapReduce Systems , 2013, Found. Trends Databases.

[11]  Nagiza F. Samatova,et al.  Community-based anomaly detection in evolutionary networks , 2012, Journal of Intelligent Information Systems.

[12]  S. Shinde,et al.  A Survey on Community Detection , 2015 .

[13]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[15]  Leonid A. Kalinichenko,et al.  Методы выявления аномалий: обзор (Methods for Anomaly Detection: a Survey) , 2014, RCDL.

[16]  W. Aisha Banu,et al.  A Survey on Community Detection Methods in Social Networks , 2015 .

[17]  George H. L. Fletcher,et al.  Stability notions in synthetic graph generation: a preliminary study , 2017, EDBT.

[18]  Shikha Agrawal,et al.  Survey on Anomaly Detection using Data Mining Techniques , 2015, KES.

[19]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Beng Chin Ooi,et al.  Query optimization for massively parallel data processing , 2011, SoCC.

[21]  Lawrence B. Holder,et al.  Anomaly detection in data represented as graphs , 2007, Intell. Data Anal..

[22]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[23]  Brett J. Borghetti,et al.  A Survey of Distance and Similarity Measures Used Within Network Intrusion Anomaly Detection , 2015, IEEE Communications Surveys & Tutorials.

[24]  Pasquale De Meo,et al.  Generalized Louvain method for community detection in large networks , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[25]  Kate Smith-Miles,et al.  A Comprehensive Survey of Data Mining-based Fraud Detection Research , 2010, ArXiv.