Telephone Call Network Data Mining: A Survey with Experiments

We survey some results of social network modeling and analysis relevant for telephone call networks and illustrate these results over the call logs of major Hungarian telephone companies. Our unique data sets include millions of users, long time range, and sufficiently strong sociodemographic information on the users. We explore properties that give stronger intuition on how contacts within real social networks arise, and suggest properties unexplained by current network evolution models.

[1]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[2]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[3]  F. Chung,et al.  Eigenvalues of Random Power law Graphs , 2003 .

[4]  F. Chung,et al.  Spectra of random graphs with given expected degrees , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  T. Vicsek,et al.  Clique percolation in random networks. , 2005, Physical review letters.

[6]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[7]  Xin Yao,et al.  A novel evolutionary data mining algorithm with applications to churn prediction , 2003, IEEE Trans. Evol. Comput..

[8]  Hector Garcia-Molina,et al.  Spam: it's not just for inboxes anymore , 2005, Computer.

[9]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  Reka Albert,et al.  Mean-field theory for scale-free random networks , 1999 .

[11]  Brian D. Davison,et al.  Knowing a web page by the company it keeps , 2006, CIKM '06.

[12]  Ichigaku Takigawa,et al.  A spectral clustering approach to optimally combining numericalvectors with a modular network , 2007, KDD '07.

[13]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[14]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[15]  William W. Cohen,et al.  Stacked Graphical Models for Efficient Inference in Markov Random Fields , 2007, SDM.

[16]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[17]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[18]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[19]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[20]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[21]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Brian D. Davison,et al.  Topical TrustRank: using topicality to combat web spam , 2006, WWW '06.

[23]  Kevin J. Lang Fixing two weaknesses of the Spectral Method , 2005, NIPS.

[24]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[25]  Illés J. Farkas,et al.  k-Clique Percolation and Clustering , 2008 .

[26]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[27]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[28]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[29]  Tobias Scheffer,et al.  Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam , 2005, ECML.

[30]  Charles J. Alpert,et al.  Spectral Partitioning: The More Eigenvectors, The Better , 1995, 32nd Design Automation Conference.

[31]  Sougata Mukherjea,et al.  On the structural properties of massive telecom call graphs: findings and implications , 2006, CIKM '06.

[32]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[33]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[34]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[35]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[36]  András A. Benczúr,et al.  SpamRank -- Fully Automatic Link Spam Detection , 2005, AIRWeb.

[37]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[38]  Brian D. Davison,et al.  Propagating Trust and Distrust to Demote Web Spam , 2006, MTW.

[39]  András A. Benczúr,et al.  To randomize or not to randomize: space optimal summaries for hyperlink analysis , 2006, WWW '06.

[40]  E. Barnes An algorithm for partitioning the nodes of a graph , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[41]  Chih-Ping Wei,et al.  Turning telecommunications call details to churn prediction: a data mining approach , 2002, Expert Syst. Appl..

[42]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[43]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[44]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[45]  Dániel Fogaras Where to Start Browsing the Web? , 2003, IICS.

[46]  Alois Potton Spam , 2003, PIK Prax. Informationsverarbeitung Kommun..

[47]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[48]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[49]  Chris H. Q. Ding,et al.  A spectral method to separate disconnected and nearly-disconnected web graph components , 2001, KDD '01.

[50]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[51]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[53]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[54]  Amit Singhal,et al.  Challenges in running a commercial search engine , 2005, SIGIR '05.

[55]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[56]  Martine D. F. Schlag,et al.  Spectral K-Way Ratio-Cut Partitioning and Clustering , 1993, 30th ACM/IEEE Design Automation Conference.

[57]  Evangelos E. Milios,et al.  Node similarity in networked information spaces , 2001, CASCON.

[58]  Ravi Kumar,et al.  Structure and evolution of blogspace , 2004, CACM.

[59]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[60]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[61]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[64]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[65]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[66]  Béla Bollobás,et al.  The degree sequence of a scale‐free random graph process , 2001, Random Struct. Algorithms.

[67]  Dániel Fogaras,et al.  Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs , 2007, IEEE Transactions on Knowledge and Data Engineering.

[68]  Jon Kleinberg,et al.  KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining , 2007, KDD 2007.

[69]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[70]  A. Barabasi,et al.  Analysis of a large-scale weighted network of one-to-one human communication , 2007, physics/0702158.

[71]  András A. Benczúr,et al.  Link-Based Similarity Search to Fight Web Spam , 2006, AIRWeb.

[72]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[73]  Ramanathan V. Guha,et al.  Propagation of trust and distrust , 2004, WWW '04.

[74]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[75]  Andrew B. Kahng,et al.  Multiway partitioning via geometric embeddings, orderings, and dynamic programming , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[76]  Luca Becchetti,et al.  Link-Based Characterization and Detection of Web Spam , 2006, AIRWeb.

[77]  Károly Csalogány,et al.  Semi-supervised learning: a comparative study for web spam and telephone user churn , 2007 .

[78]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[79]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[80]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[81]  Miklos Kurucz,et al.  Spectral clustering in telephone call graphs , 2007, WebKDD/SNA-KDD '07.

[82]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[83]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[84]  John Scott Social Network Analysis , 1988 .

[85]  Ian Witten,et al.  Data Mining , 2000 .

[86]  R Agrawal,et al.  Fast mining of massive tabular data via approximate distance computations , 2002 .

[87]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[88]  Tsau Young Lin,et al.  Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA , 2001 .

[89]  Marc Najork,et al.  Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.