Parallel implementing improved k-means applied for image retrieval and anomaly detection

Anomaly detection based on data mining is one of the key technologies to be applied to intelligent detection. K-means is a classic clustering algorithm which is efficient for anomaly detection. Traditional K-means is sensitive to the selection of initial clustering centers. Different initial value can cause different clustering results. We combine improved DD algorithm with information entropy to improve the performance of K-means. Improved K-means can optimize the selection of initial clustering centers; automatically decide the number of clusters and output stable clustering results. After the pretreatment of PCA, the adaptability of improved K-means has a distinct progress. To solve the problem of massive data processing time, we adopt the technology of cloud computing and modify the algorithm for parallel processing. We analyze the performance of improved K-means by using different data sets, KDD Cup99 and public mobile malware data set (i.e. MalGenome). The experimental results illustrate that improved K-means has accurate results and can be applied to anomaly detection in mobile networks. This improved K-means also can be applied for image retrieval by calculating the similarity between each image.

[1]  P. S. Sastry,et al.  A survey of temporal data mining , 2006 .

[2]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[3]  Ganesh Kumar,et al.  Anomaly Detection System in Cloud Environment Using Fuzzy Clustering Based ANN , 2015, Mobile Networks and Applications.

[4]  Padraig Cunningham,et al.  An evaluation of dimension reduction techniques for one-class classification , 2007, Artificial Intelligence Review.

[5]  Ohad Shamir,et al.  Stability and model selection in k-means clustering , 2010, Machine Learning.

[6]  Lu Feng,et al.  A Feature Selection Method for Improved Clonal Algorithm Towards Intrusion Detection , 2016, Int. J. Pattern Recognit. Artif. Intell..

[7]  Lu Feng,et al.  An improved Hoeffding-ID data-stream classification algorithm , 2015, The Journal of Supercomputing.

[8]  Georgios Kambourakis,et al.  New facets of mobile botnet: architecture and evaluation , 2015, International Journal of Information Security.

[9]  Luo Si-biao Accurate property weighted K-means clustering algorithm based on information entropy , 2011 .

[10]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Bin Gu,et al.  Incremental learning for ν-Support Vector Regression , 2015, Neural Networks.

[12]  Ali Feizollah,et al.  Evaluation of machine learning classifiers for mobile malware detection , 2014, Soft Computing.

[13]  Gisung Kim,et al.  Self-adaptive and dynamic clustering for online anomaly detection , 2011, Expert Syst. Appl..

[14]  Wang Zhixiao Optimization to k-means initial cluster centers , 2011 .

[15]  Jin Wang,et al.  Botnet Detection Based on Correlation of Malicious Behaviors , 2013 .

[16]  Chunyong Yin,et al.  Towards Accurate Node-Based Detection of P2P Botnets , 2014, TheScientificWorldJournal.