Density-Ratio Peak Based Semi-Supervised Algorithm for Access Network User Behavior Analysis

In order to improve the prediction accuracy of the access network user behavior (ANUB), we propose a novel density-ratio peak (DRP)-based semi-supervised algorithm. It first rescales the given dataset with non-uniform density clusters by density-ratio estimation (DRE) and conducts the subscriber detailed classification by using the density peak (DP) algorithm. The proposed DRP algorithm can identify all clusters in a dataset with greatly varying densities. Then, a semi-supervised algorithm evolves three regression prediction methods, namely, an auto regressive and moving average (ARMA), an auto regressive integrated moving average (ARIMA), and a fractionally auto regressive integrated moving average (FARIMA), as typical representatives to generate an accurate predictor for the ANUB and establish a prediction model for each subcluster. The behaviors of access network users in the same subcategory share similarities and the statistics of the behaviors of all access network users in a district can emerge the model of the district network properties, such as the prediction of the district network traffic that is more detailed than the direct prediction. The proposed model is evaluated through the dataset of the ANUB collected from China Telecom, and the obtained results show that the integrated model is an effective way to improve the accuracy of prediction achieved by the DRP clustering, compared with the conventional ones.

[1]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[2]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[3]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[4]  Yanchun Zhang,et al.  Node-coupling clustering approaches for link prediction , 2015, Knowl. Based Syst..

[5]  Sabu M. Thampi,et al.  An Enhanced Search Technique for Managing Partial Coverage and Free Riding in P2P Networks , 2010, ArXiv.

[6]  Chen Chen,et al.  Security enhancement for OFDM-PON using Brownian motion and chaos in cell. , 2018, Optics express.

[7]  Shaun S. Wulff,et al.  Time Series Analysis: Forecasting and Control, 5th edition , 2017 .

[8]  Li Yan,et al.  A novel density peak based semi-supervised clustering algorithm , 2016 .

[9]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  Jacek Ilow Forecasting network traffic using FARIMA models with heavy tailed innovations , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[13]  Sophia Daskalaki,et al.  Comparing forecasting approaches for Internet traffic , 2015, Expert Syst. Appl..

[14]  Zhengming Ma,et al.  Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy , 2017, Knowl. Based Syst..

[15]  Raouf Boutaba,et al.  Machine Learning for Cognitive Network Management , 2018, IEEE Communications Magazine.

[16]  Kai Ming Ting,et al.  Density-ratio based clustering for discovering clusters with varying densities , 2016, Pattern Recognit..

[17]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[18]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[19]  Wei Zhang,et al.  Physical-Enhanced Secure Strategy for OFDMA-PON Using Chaos and Deoxyribonucleic Acid Encoding , 2018, Journal of Lightwave Technology.

[20]  Qingshan Li Mobile User Network Behavior Analysis Based on Improved Fuzzy C-Means Clustering , 2016, 2016 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS).

[21]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[22]  B. Eswara Reddy,et al.  A moving-average filter based hybrid ARIMA-ANN model for forecasting time series data , 2014, Appl. Soft Comput..

[23]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Joannis Apostolakis,et al.  An Introduction to Data Mining , 2009 .

[25]  Takafumi Kanamori,et al.  Semi-supervised learning with density-ratio estimation , 2012, Machine Learning.