Recognizing and characterizing dynamics of cellular devices in cellular data network through massive data analysis

The user clients for accessing Internet are increasingly shifting from desktop computers to cellular devices. To be competitive in the rapidly changing market, operators, Internet service providers and application developers are required to have the capability of recognizing the models of cellular devices and understanding the traffic dynamics of cellular data network. In this paper, we propose a novel Jaccard measurement‐based method to recognize cellular device models from network traffic data. This method is implemented as a scalable paralleled MapReduce program and achieves a high accuracy, 91.5%, in the evaluation with 2.9 billion traffic records collected from the real network. Based on the recognition results, we conduct a comprehensive study of three characteristics of network traffic from device model perspective, the network access time, the traffic volume, and the diurnal patterns. The analysis results show that the distribution of network access time can be modeled by a two‐component Gaussian mixture model, and the distribution of traffic volumes is highly skewed and follows the power law. In addition, seven distinct diurnal patterns of cellular device usage are identified by applying unsupervised clustering algorithm on the collected massive traffic data. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Aitor Almeida,et al.  A method for automatic generation of fuzzy membership functions for mobile device's characteristics based on Google Trends , 2013, Comput. Hum. Behav..

[2]  Claudio E. Palazzi,et al.  Social‐aware delay tolerant networking for mobile‐to‐mobile file sharing , 2012, Int. J. Commun. Syst..

[3]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[4]  Lada A. Adamic,et al.  Zipf's law and the Internet , 2002, Glottometrics.

[5]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[6]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[7]  V. Jawahar Senthil Kumar,et al.  Evaluating the Performance of Similarity Measures Used in Document Clustering and Information Retrieval , 2010, 2010 First International Conference on Integrated Intelligent Computing.

[8]  A. Liu,et al.  Characterizing and modeling internet traffic dynamics of cellular devices , 2011, PERV.

[9]  Tsvetozar Georgiev,et al.  Methodology for mobile devices characteristics recognition , 2007, CompSysTech '07.

[10]  M. Crovella,et al.  Heavy-tailed probability distributions in the World Wide Web , 1998 .

[11]  J. A. Tenreiro Machado,et al.  A review of power laws in real life phenomena , 2012 .

[12]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[13]  Daniel Gatica-Perez,et al.  Who's Who with Big-Five: Analyzing and Classifying Personality Traits with Smartphones , 2011, 2011 15th Annual International Symposium on Wearable Computers.

[14]  Feng Li,et al.  Heterogeneous wireless access technology and its impact on forming and maintaining friendship through mobile social networks , 2012, Int. J. Commun. Syst..

[15]  Richard G. Clegg,et al.  A critical look at power law modelling of the Internet , 2009, Comput. Commun..

[16]  Paramvir Bahl,et al.  Anatomizing application performance differences on smartphones , 2010, MobiSys '10.

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  Christian Callegari,et al.  Skype-Hunter: A real-time system for the detection and classification of Skype traffic , 2012, Int. J. Commun. Syst..

[19]  Deborah Estrin,et al.  Diversity in smartphone usage , 2010, MobiSys '10.

[20]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[21]  Xueqi Cheng,et al.  Mobile social networks: state-of-the-art and a new vision , 2012, Int. J. Commun. Syst..

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  S. K. Baghel,et al.  An investigation into traffic analysis for diverse data applications on smartphones , 2012, 2012 National Conference on Communications (NCC).

[24]  R. Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .