Churn detection in large user networks

Anomaly detection on dynamic real-world networks such as large caller networks and online social networks is a very difficult problem, analogous to looking for a needle in a haystack. This paper considers detecting churners in a 3.7 million mobile phone network. The two main issues are designing fast and efficient features and classifiers. We discuss both in this paper. We associate every caller in the network with an activity vector and an affinity graph, and our features are derived from activity levels computed from subgraphs of the affinity graph. These features reflect the graph-dependent nature of the problem. To compute these networks expeditiously, we extend as integral affinity graphs the concept of integral images. Our anomaly classifier is a cascaded classifier with stages that combine naive Bayes and decision tree classifiers. Simulations with a 3.7 million cell phone user network illustrate an anomaly classifier that reaches a false alarm rate of 0.8% with a churn detection rate of 71%.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  S. Berg Snowball Sampling—I , 2006 .

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Josep Lluís de la Rosa i Esteva,et al.  Kalman Filters to Generate Customer Behavior Alarms , 2007, CCIA.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  Matthew J. Salganik,et al.  5. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling , 2004 .

[9]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[10]  José M. F. Moura,et al.  Graph sampling: Estimation of degree distributions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[12]  Athina Markopoulou,et al.  Towards Unbiased BFS Sampling , 2011, IEEE Journal on Selected Areas in Communications.

[13]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[14]  A. Khenchaf,et al.  An SVM based churn detector in prepaid mobile telephony , 2004, Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004..

[15]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[16]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.