Identifying user habits through data mining on call data records

In this paper we propose a frameworks for identifying patterns and regularities in the pseudo-anonymized Call Data Records (CDR) pertaining a generic subscriber of a mobile operator. We face the challenging task of automatically deriving meaningful information from the available data, by using an unsupervised procedure of cluster analysis and without including in the model any a priori knowledge on the applicative context. Clusters mining results are employed for understanding users' habits and to draw their characterizing profiles. We propose two implementations of the data mining procedure; the first is based on a novel system for clusters and knowledge discovery called LD-ABCD, capable of retrieving clusters and, at the same time, to automatically discover for each returned cluster the most appropriate dissimilarity measure (local metric). The second approach instead is based on PROCLUS, the well-know subclustering algorithm. The dataset under analysis contains records characterized only by few features and, consequently, we show how to generate additional fields which describe implicit information hidden in data. Finally, we propose an effective graphical representation of the results of the data-mining procedure, which can be easily understood and employed by analysts for practical applications.

[1]  Lei Wang,et al.  Efficient Dual Approach to Distance Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Jiliang Tang,et al.  Mobile Location Prediction in Spatio-Temporal Context , 2012 .

[3]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[4]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[5]  Antonello Rizzi,et al.  Short-Term Electric Load Forecasting Using Echo State Networks and PCA Decomposition , 2015, IEEE Access.

[6]  L. Capra,et al.  Ubiquitous Sensing for Mapping Poverty in Developing Countries , 2013 .

[7]  Simone Scardapane,et al.  Granular Computing Techniques for Classification and Semantic Characterization of Structured Data , 2015, Cognitive Computation.

[8]  L WolfJoel,et al.  Fast algorithms for projected clustering , 1999 .

[9]  Ismail Hakki Toroslu,et al.  Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques , 2016, Comput. J..

[10]  Kwok-wing Chau,et al.  Neural network river forecasting with multi-objective fully informed particle swarm optimization , 2015 .

[11]  Simone Scardapane,et al.  Prediction of telephone calls load using Echo State Network with exogenous variables , 2015, Neural Networks.

[12]  Peter Funk,et al.  Fault diagnosis in industry using sensor readings and case-based reasoning , 2004, J. Intell. Fuzzy Syst..

[13]  Fang Dong,et al.  When and where next: individual mobility prediction , 2012, MobiGIS.

[14]  Etienne Huens,et al.  Data for Development: the D4D Challenge on Mobile Phone Data , 2012, ArXiv.

[15]  Tapio Elomaa,et al.  Principles of Data Mining and Knowledge Discovery , 2002, Lecture Notes in Computer Science.

[16]  Kwok-wing Chau,et al.  Improving Forecasting Accuracy of Annual Runoff Time Series Using ARIMA Based on EEMD Decomposition , 2015, Water Resources Management.

[17]  Meng Wang,et al.  Semi-supervised distance metric learning based on local linear regression for data clustering , 2012, Neurocomputing.

[18]  Man Lung Yiu,et al.  Frequent-pattern based iterative projected clustering , 2003, Third IEEE International Conference on Data Mining.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[21]  O. Järv,et al.  Using Mobile Positioning Data to Model Locations Meaningful to Users of Mobile Phones , 2010 .

[22]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[23]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[24]  Rong Jin,et al.  A Boosting Framework for Visuality-Preserving Distance Metric Learning and Its Application to Medical Image Retrieval , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Christopher A. Badurek,et al.  Review of Information visualization in data mining and knowledge discovery by Usama Fayyad, Georges G. Grinstein, and Andreas Wierse. Morgan Kaufmann 2002 , 2003 .

[26]  Baitao Li Chang,et al.  DPF - a perceptual distance function for image retrieval , 2002, Proceedings. International Conference on Image Processing.

[27]  Dimitar Filev,et al.  Generation of Fuzzy Rules by Mountain Clustering , 1994, J. Intell. Fuzzy Syst..

[28]  Steve Chan,et al.  Exploration and analysis of massive mobile phone data : a layered visual analytics approach , 2013 .

[29]  Witold Pedrycz Proximity-Based Clustering: A Search for Structural Consistency in Data With Semantic Blocks of Features , 2013, IEEE Transactions on Fuzzy Systems.

[30]  Witold Pedrycz,et al.  Local descriptors and similarity measures for frontal face recognition: A comparative analysis , 2013, J. Vis. Commun. Image Represent..

[31]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams , 2007, SAC '07.

[32]  Nathalie Japkowicz,et al.  Concept learning in the absence of counterexamples: an autoassociation-based approach to classification , 1999 .

[33]  Cameron D. Palmer,et al.  Association Testing of Previously Reported Variants in a Large Case-Control Meta-analysis of Diabetic Nephropathy , 2011, Diabetes.

[34]  Markus Friedrich,et al.  Generating Trajectories from Mobile Phone Data , 2010 .

[35]  A. Pentland,et al.  Computational Social Science , 2009, Science.

[36]  Lorenzo Livi,et al.  Matching of time-varying labeled graphs , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[37]  Hunter N. B. Moseley,et al.  Limits of Predictability in Human Mobility , 2010 .

[38]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[39]  Download Book,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[40]  Carlo Ratti,et al.  Real-Time Urban Monitoring Using Cell Phones: A Case Study in Rome , 2011, IEEE Transactions on Intelligent Transportation Systems.

[41]  Antonio Lima,et al.  Exploiting Cellular Data for Disease Containment and Information Campaigns Strategies in Country-Wide Epidemics , 2013, ArXiv.

[42]  Qi Huang,et al.  Semi-supervised fuzzy clustering with metric learning and entropy regularization , 2012, Knowl. Based Syst..

[43]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[44]  Jun Zhang,et al.  Multilayer Ensemble Pruning via Novel Multi-sub-swarm Particle Swarm Optimization , 2009, J. Univers. Comput. Sci..

[45]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[46]  Chin-Chun Chang,et al.  A boosting approach for supervised Mahalanobis distance metric learning , 2012, Pattern Recognit..

[47]  Santosh S. Vempala,et al.  On clusterings: Good, bad and spectral , 2004, JACM.

[48]  Chiara Renso,et al.  Analysis of GSM calls data for understanding user mobility behavior , 2013, 2013 IEEE International Conference on Big Data.

[49]  Lai Tu,et al.  We Know What You Are--A User Classification Based on Mobile Data , 2013, 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing.

[50]  Lorenzo Livi,et al.  An agent-based algorithm exploiting multiple local dissimilarities for clusters mining and knowledge discovery , 2014, Soft Comput..

[51]  Antonello Rizzi,et al.  Automatic Classification of Graphs by Symbolic Histograms , 2007, 2007 IEEE International Conference on Granular Computing (GRC 2007).

[52]  Kudret Demirli,et al.  Higher order fuzzy system identification using subtractive clustering , 2000, J. Intell. Fuzzy Syst..

[53]  Yves Lechevallier,et al.  Nonlinear multicriteria clustering based on multiple dissimilarity matrices , 2013, Pattern Recognit..

[54]  Ngoc Thanh Nguyen,et al.  Constructing and mining a semantic-based academic social network , 2010, J. Intell. Fuzzy Syst..

[55]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[56]  G. Madey,et al.  Uncovering individual and collective human dynamics from mobile phone records , 2007, 0710.2939.

[57]  Shanwen Zhang,et al.  Dimension Reduction Using Semi-Supervised Locally Linear Embedding for Plant Leaf Classification , 2009, ICIC.

[58]  Marco Luca Sbodio,et al.  AllAboard: A System for Exploring Urban Mobility and Optimizing Public Transport Using Cellphone Data , 2013, ECML/PKDD.

[59]  Lorenzo Livi,et al.  A Granular Computing approach to the design of optimized graph classification systems , 2014, Soft Comput..

[60]  Robert P. W. Duin,et al.  A Combine-Correct-Combine Scheme for Optimizing Dissimilarity-Based Classifiers , 2009, CIARP.