Subscriber classification within telecom networks utilizing big data technologies and machine learning

This paper describes a scalable solution for identifying influential subscribers in for example telecom networks. The solution estimates one weighted value of influence out of several Social Network Analysis(SNA) metrics. The novel method for aggregation of several metrics utilizes machine learning to train models. A prototype solution has been implemented on a Hadoop platform to support scalability and to reduce hard ware cost by enabling the usage of commodity computers. The SNA algorithms have been adapted to efficiently execute on the MapReduce distributed platform. The prototype solution has been tested on a Hadoop cluster. The tests have verified that the solution can scale to support networks with millions of subscribers. Both real data from a telecom network operator with 2.4 million subscribers and synthetic data for networks up to 100 million subscribers have been used to verify the scalability and accuracy of the solution. The correlation between metrics have been analyzed to identify the information gain from each metric.

[1]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[2]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[3]  David A. Bader,et al.  SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[4]  Nitesh V. Chawla,et al.  DisNet: A Framework for Distributed Graph Computation , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[5]  B. Bollobás The evolution of random graphs , 1984 .

[6]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Jimmy J. Lin,et al.  Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.

[8]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[9]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[10]  Jimeng Sun,et al.  Centralities in Large Networks: Algorithms and Observations , 2011, SDM.

[11]  Jin-Soo Kim,et al.  HAMA: An Efficient Matrix Computation with the MapReduce Framework , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[12]  Katarzyna Musial,et al.  User position measures in social networks , 2009, SNA-KDD '09.

[13]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[14]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[15]  Fredrik Hildorsson Scalable Solutions for Social Network Analysis , 2009 .

[16]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[17]  Y. Narahari,et al.  A Shapley Value-Based Approach to Discover Influential Nodes in Social Networks , 2011, IEEE Transactions on Automation Science and Engineering.

[18]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[19]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[20]  Martin Everett,et al.  Ego network betweenness , 2005, Soc. Networks.

[21]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[22]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[23]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing - "ABSTRACT" , 2009, PODC '09.

[24]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[25]  Srikanta Tirthapura,et al.  Proceedings of the 28th ACM symposium on Principles of distributed computing , 2009, PODC 2009.

[26]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .