Predicting complex user behavior from CDR based social networks

Abstract Call Detail Record (CDR) datasets provide enough information about personal interactions of cell phone service customers to enable building detailed social networks. We take one such dataset and create a realistic social network to predict which customer will default on payments for the phone services, a complex behavior combining social, economic, and legal considerations. After extracting a large feature set from this network, we find that each feature poorly correlates with the default status. Hence, we develop a sophisticated model to enable reliable predictions. Our main contribution is a methodology for building complex behavior models from very large sets of diverse features and using different methods to choose those features that perform best for the final model. This approach enables us to identify the most efficient features for our problem which, unexpectedly, are based on the number of unique users with whom the given user communicates around the Christmas and New Year’s Eve holidays. In general, features based on the number of close ties maintained by a user perform better than others. Our resulting models significantly outperform the methods currently published in the literature. The paper contributes also a systematic analysis of properties of the network derived from CDR.

[1]  Jae Kwon Bae,et al.  A Personal Credit Rating Prediction Model Using Data Mining in Smart Ubiquitous Environments , 2015, Int. J. Distributed Sens. Networks.

[2]  Albert-Lszl Barabsi,et al.  Network Science , 2016, Encyclopedia of Big Data.

[3]  Natalya V. Kuznetsova,et al.  Modeling of Credit Risks on the Basis of the Theory of Survival , 2017 .

[4]  Danny Azucar,et al.  Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis , 2018 .

[5]  Scott Tonidandel,et al.  Multivariate relative importance: extending relative weight analysis to multivariate criterion spaces. , 2008, The Journal of applied psychology.

[6]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[7]  Dietmar Plenz,et al.  powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions , 2013, PloS one.

[8]  Daniel Gatica-Perez,et al.  Mining large-scale smartphone data for personality studies , 2013, Personal and Ubiquitous Computing.

[9]  Nuria Oliver,et al.  MobiScore: Towards Universal Credit Scoring from Mobile Phone Data , 2015, UMAP.

[10]  Weimin Li,et al.  Personalized fitting recommendation based on support vector regression , 2015, Human-centric Computing and Information Sciences.

[11]  Izzat Alsmadi,et al.  Evaluation of Spam Impact on Arabic Websites Popularity , 2015, J. King Saud Univ. Comput. Inf. Sci..

[12]  Xiaoming Liu,et al.  SLPA: Uncovering Overlapping Communities in Social Networks via a Speaker-Listener Interaction Dynamic Process , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[13]  Peter Vojtás,et al.  Using Implicit Preference Relations to Improve Recommender Systems , 2017, Journal on Data Semantics.

[14]  Alex Pentland,et al.  Predicting Personality Using Novel Mobile Phone-Based Metrics , 2013, SBP.

[15]  David C. Yen,et al.  Applying data mining to telecom churn management , 2006, Expert Syst. Appl..

[16]  Daniel Björkegren,et al.  Behavior Revealed in Mobile Phone Usage Predicts Loan Repayment , 2017, The World Bank Economic Review.

[17]  Nuria Oliver,et al.  Towards a psychographic user model from mobile phone usage , 2011, CHI Extended Abstracts.

[18]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[19]  Astronomie Physik,et al.  Principle of Least Effort , 2010 .

[20]  A. Umamakeswari,et al.  RFM ranking - An effective approach to customer segmentation , 2018, J. King Saud Univ. Comput. Inf. Sci..

[21]  Víctor Soto,et al.  Prediction of socioeconomic levels using cell phone records , 2011, UMAP'11.

[22]  Ricardo Buettner,et al.  Predicting user behavior in electronic markets based on personality-mining in large online social networks , 2017, Electron. Mark..

[23]  Cheng Wang,et al.  A dyadic reciprocity index for repeated interaction networks* , 2013, Network Science.

[24]  Jianqiang Wang,et al.  Personalized restaurant recommendation method combining group correlations and customer preferences , 2018, Inf. Sci..

[25]  D. R. Thomas,et al.  On Measuring the Relative Importance of Explanatory Variables in a Logistic Regression , 2008 .

[26]  Santo Fortunato,et al.  Weight Thresholding on Complex Networks , 2018, Physical Review E.

[27]  Kimmo Kaski,et al.  Reciprocity of mobile phone calls , 2010, 1002.0763.

[28]  Boleslaw K. Szymanski,et al.  Coevolution of a multilayer node-aligned network whose layers represent different social relations , 2017, Computational social networks.

[29]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[30]  Richard Weber,et al.  Credit scoring using three-way decisions with probabilistic rough sets , 2020, Inf. Sci..

[31]  Bruno D Zumbo,et al.  On Johnson's (2000) Relative Weights Method for Assessing Variable Importance: A Reanalysis , 2014, Multivariate behavioral research.

[32]  Jaehoon Lee,et al.  Power users are not always powerful: The effect of social trust clusters in recommender systems , 2018, Inf. Sci..

[33]  Vincent D. Blondel,et al.  A survey of results on mobile phone datasets analysis , 2015, EPJ Data Science.

[34]  Markus H. Gross,et al.  Ten Years of Research on Intelligent Educational Games for Learning Spelling and Mathematics , 2018, ArXiv.

[35]  Hernán A. Makse,et al.  Inferring personal economic status from social network location , 2017, Nature Communications.

[36]  D. R. Thomas,et al.  On Variable Importance in Linear Regression , 1998 .

[37]  Blaine Nelson,et al.  Analyzing Behavioral Features for Email Classification , 2005, CEAS.

[38]  Zhong Ming,et al.  Mixed factorization for collaborative recommendation with heterogeneous explicit feedbacks , 2016, Inf. Sci..

[39]  D. Budescu,et al.  The dominance analysis approach for comparing predictors in multiple regression. , 2003, Psychological methods.

[40]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[41]  V. Singh,et al.  Predicting financial trouble using call data—On social capital, phone logs, and financial trouble , 2018, PloS one.

[42]  Rosario N. Mantegna,et al.  Statistically validated mobile communication networks: the evolution of motifs in European and Chinese data , 2014, ArXiv.

[43]  Hamido Fujita,et al.  Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates , 2018, Inf. Sci..

[44]  Araceli Sanchis,et al.  Real-Time Recognition of Calling Pattern and Behaviour of Mobile Phone Users through Anomaly Detection and Dynamically-Evolving Clustering , 2017 .

[45]  D. Lazer,et al.  Inferring Social Network Structure using Mobile Phone Data , 2006 .

[46]  Francesco Ciampi,et al.  Corporate governance characteristics and default prediction modeling for small enterprises. An empirical analysis of Italian firms , 2015 .

[47]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[48]  Andreas Klaus,et al.  Statistical Analyses Support Power Law Distributions Found in Neuronal Avalanches , 2011, PloS one.

[49]  Kjersti Aas,et al.  Predicting mortgage default using convolutional neural networks , 2018, Expert Syst. Appl..

[50]  Rich Ling,et al.  The socio-demographics of texting: An analysis of traffic data , 2012, New Media Soc..

[51]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[52]  Linpeng Huang,et al.  What-If Model Construction and Validation of Web Systems Based on Log Mining , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[53]  Yi Wang,et al.  To Buy or Not to Buy? Understanding the Role of Personality Traits in Predicting Consumer Behaviors , 2016, SocInfo.

[54]  Ulrik Brandes,et al.  What is network science? , 2013, Network Science.

[55]  Alex Pentland,et al.  Predicting Spending Behavior Using Socio-mobile Features , 2013, 2013 International Conference on Social Computing.

[56]  Jian Hu,et al.  Personal Credit Rating Assessment for the National Student Loans Based on Artificial Neural Network , 2009, 2009 International Conference on Business Intelligence and Financial Engineering.

[57]  Wahidah Husain A Study of Customer Behaviour Through Web Mining , 2015 .

[58]  J. W. Johnson A Heuristic Method for Estimating the Relative Weight of Predictor Variables in Multiple Regression , 2000, Multivariate behavioral research.

[59]  Licia Capra,et al.  Poverty on the cheap: estimating poverty maps using aggregated mobile communication networks , 2014, CHI.

[60]  Chen Lin,et al.  Personalized news recommendation via implicit social experts , 2014, Inf. Sci..

[61]  Yezheng Liu,et al.  PT-LDA: A latent variable model to predict personality traits of social network users , 2016, Neurocomputing.

[62]  Ted Scully,et al.  Application of multiple change point detection methods to large urban telecommunication networks , 2017, 2017 28th Irish Signals and Systems Conference (ISSC).

[63]  Xiaoyi Jiang,et al.  Churn Prediction in Customer Relationship Management via GMDH-Based Multiple Classifiers Ensemble , 2016, IEEE Intelligent Systems.

[64]  Christian Bauckhage,et al.  Predicting player churn in the wild , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[65]  John Skvoretz,et al.  Node centrality in weighted networks: Generalizing degree and shortest paths , 2010, Soc. Networks.

[66]  Fabrício Benevenuto,et al.  The strength of the work ties , 2017, Inf. Sci..

[67]  Etienne Huens,et al.  Geographical dispersal of mobile communication networks , 2008, 0802.2178.

[68]  Yannick Rochat,et al.  Closeness Centrality Extended to Unconnected Graphs: the Harmonic Centrality Index , 2009 .

[69]  Bin Gu,et al.  Predicting and Deterring Default with Social Media Information in Peer-to-Peer Lending , 2017, J. Manag. Inf. Syst..

[70]  Wen Zhang,et al.  DeepRec: A deep neural network approach to recommendation with item embedding and weighted loss function , 2019, Inf. Sci..

[71]  Boleslaw K. Szymanski,et al.  Community detection using a neighborhood strength driven Label Propagation Algorithm , 2011, 2011 IEEE Network Science Workshop.

[72]  Huifang Ma,et al.  Combining tag correlation and user social relation for microblog recommendation , 2017, Inf. Sci..

[73]  Lin Li,et al.  Predicting Active Users' Personality Based on Micro-Blogging Behaviors , 2014, PloS one.

[74]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .