An Efficient and Spam-Robust Proximity Measure Between Communication Entities

Electronic communication service providers are obliged to retain communication data for a certain amount of time by their local laws. The retained communication data or the communication logs are used in various applications such as crime detection, viral marketing, analytical study, and so on. Many of these applications rely on effective techniques for analyzing communication logs. In this paper, we focus on measuring the proximity between two communication entities, which is a fundamental and important step toward further analysis of communication logs, and propose a new proximity measure called ESP (Efficient and Spam-Robust Proximity measure). Our proposed measure considers only the (graph-theoretically) shortest paths between two entities and gives small values to those between spam-like entities and others. Thus, it is not only computationally efficient but also spam-robust. By conducting several experiments on real and synthetic datasets, we show that our proposed proximity measure is more accurate, computationally efficient and spam-robust than the existing measures in most cases.

[1]  Christian Lantuéjoul,et al.  Geodesic methods in quantitative image analysis , 1984, Pattern Recognit..

[2]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[3]  Hanghang Tong,et al.  Measuring Proximity on Graphs with Side Information , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[5]  Christos Faloutsos,et al.  Random walk with restart: fast solutions and applications , 2008, Knowledge and Information Systems.

[6]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[7]  Sharma Chakravarthy,et al.  eMailSift: eMail classification based on structure and content , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[9]  Kenichi Kurihara,et al.  A Frequency-based Stochastic Blockmodel , 2006 .

[10]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[11]  C. Lawrence The Social Psychology of Crime: Groups, Teams and Networks , 2000 .

[12]  Jafar Adibi,et al.  The Enron Email Dataset Database Schema and Brief Statistical Report , 2004 .

[13]  Murat Kantarcioglu,et al.  Design and implementation of a secure social network system , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.

[14]  Panayiotis Kotzanikolaou Data Retention and Privacy in Electronic Communications , 2008, IEEE Security & Privacy.

[15]  Bo Yu,et al.  A comparative study for content-based dynamic spam classification using four machine learning algorithms , 2008, Knowl. Based Syst..

[16]  Yehuda Koren,et al.  Measuring and extracting proximity graphs in networks , 2007, TKDD.

[17]  Martin G. Everett,et al.  A Graph-theoretic perspective on centrality , 2006, Soc. Networks.

[18]  Luci Pirmez,et al.  Enhancing Levenshtein distance algorithm for assessing behavioral trust , 2010, Comput. Syst. Sci. Eng..

[19]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[20]  Yiming Yang,et al.  Stochastic link and group detection , 2002, AAAI/IAAI.

[21]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[22]  Soille Pierre,et al.  On the Use of Geodesic Distances for Spatial Interpolation , 2007 .